Commit c4f0219c authored by Iustin Pop's avatar Iustin Pop
Browse files

watcher: automatically restart noded/rapi



This patch makes the watcher automatically restart the node and rapi
daemons, if they are not running (as per the PID file).

This is not an exhaustive test; a better one would be TCP connect to the
port, and an even better one a simple protocol ping (e.g. get / for rapi
and a rpc_call_alive for noded), but since we don't know how they've
been started we can't implement it today. rapi would need to write the
SSL/port to a file, and noded something similar, so that we know how to
connect.
Signed-off-by: default avatarIustin Pop <iustin@google.com>
Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
parent 24edc6d4
......@@ -80,6 +80,20 @@ def StartMaster():
return not result.failed
def EnsureDaemon(daemon):
"""Check for and start daemon if not alive.
"""
pidfile = utils.DaemonPidFileName(daemon)
pid = utils.ReadPidFile(pidfile)
if pid == 0 or not utils.IsProcessAlive(pid): # no file or dead pid
logging.debug("Daemon '%s' not alive, trying to restart", daemon)
result = utils.RunCmd([daemon])
if not result:
logging.error("Can't start daemon '%s', failure %s, output: %s",
daemon, result.fail_reason, result.output)
class WatcherState(object):
"""Interface to a state file recording restart attempts.
......@@ -464,6 +478,10 @@ def main():
update_file = False
try:
# on master or not, try to start the node dameon (use _PID but is
# the same as daemon name)
EnsureDaemon(constants.NODED_PID)
notepad = WatcherState()
try:
try:
......@@ -482,6 +500,9 @@ def main():
# else retry the connection
client = cli.GetClient()
# we are on master now (use _PID but is the same as daemon name)
EnsureDaemon(constants.RAPI_PID)
try:
watcher = Watcher(options, notepad)
except errors.ConfigurationError:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment