Commit 340f4757 authored by Iustin Pop's avatar Iustin Pop
Browse files

masterd: move the IP activation from Exec to Check



Currently, the master IP activation is done in the Exec function. Since
the original masterd process returns after forking, and Exec is run in
the (grand)child process, this means that after 'ganeti-masterd' has
returned there are still initialization tasks running.

Normally this is not a problem, but in cases where one does quick master
failovers, this creates a race condition which hits the QA scripts
especially hard.

To solve this, and make the startup process cleaner (the system is in
steady state after the command has returned, even though masterd startup
could still fail), we move the IP activation to Check(). This also
allows error messages about the IP activation to be seen on the console.

With this patch enabled, I can no longer reproduce the double-failover
errors, which were occuring before in 4/5 cases.
Signed-off-by: default avatarIustin Pop <iustin@google.com>
Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
parent e0e916fe
......@@ -463,6 +463,16 @@ def CheckAgreement():
return result
@rpc.RunWithRPC
def ActivateMasterIP():
# activate ip
master_node = ssconf.SimpleStore().GetMasterNode()
result = rpc.RpcRunner.call_node_start_master(master_node, False, False)
msg = result.fail_msg
if msg:
logging.error("Can't activate master IP address: %s", msg)
def CheckMasterd(options, args):
"""Initial checks whether to run or exit with a failure.
......@@ -505,6 +515,12 @@ def CheckMasterd(options, args):
if not utils.RunInSeparateProcess(CheckAgreement):
sys.exit(constants.EXIT_FAILURE)
# ActivateMasterIP also uses RPC/threads, so we run it again via a
# separate process.
# TODO: decide whether failure to activate the master IP is a fatal error
utils.RunInSeparateProcess(ActivateMasterIP)
def ExecMasterd(options, args): # pylint: disable-msg=W0613
"""Main master daemon function, executed with the PID file held.
......@@ -520,13 +536,6 @@ def ExecMasterd(options, args): # pylint: disable-msg=W0613
try:
rpc.Init()
try:
# activate ip
master_node = ssconf.SimpleStore().GetMasterNode()
result = rpc.RpcRunner.call_node_start_master(master_node, False, False)
msg = result.fail_msg
if msg:
logging.error("Can't activate master IP address: %s", msg)
master.setup_queue()
try:
mainloop.Run()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment