Skip to content
  • Iustin Pop's avatar
    watcher: try to restart the master if down · 7dfb83c2
    Iustin Pop authored
    
    
    Bugs in either our code or in associated libraries can bring the master daemon
    down, and this (due to the 2.0 architecture) stops all work on the cluster.
    
    Since the watcher already does periodic checks on the cluster, we modify
    it to try to start the master automatically in case of failures to
    connect. This will be tried only once per cycle.
    
    Also, in this case, we modify the code so that the watcher status file
    is not updated - its timestamp will reflect thus the time of last
    successful connection to the master.
    
    Side note: the except errors.ConfigurationError part could be cleaned
    up, since in 2.0 we don't usually get that directly, and if we do it's
    an error and we shouldn't touch the file anyway; but that is not a rc5
    change.
    
    Signed-off-by: default avatarIustin Pop <iustin@google.com>
    7dfb83c2