Commit db8667b7 authored by Iustin Pop's avatar Iustin Pop
Browse files

Workaround fake failures in drbd+live migration



This patch is an attempt to fix the ugly issue during migration:
  Cannot resync disks on node …: [True, 100]

If my understanding is correct, sometimes we poll the /proc/drbd file at
an inoportune moment, while it's being updated, or while the DRBD device
is changing state, and we see an unexpected state.

Based on the assumption that this is just a transient state, rather than
aborting directly, we change the backend.DrbdWaitSync() function to
retry a few times the operation, giving DRBD a chance to settle down at
the end of the resync.
Signed-off-by: default avatarIustin Pop <iustin@google.com>
Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
parent 099c52ad
......@@ -2547,14 +2547,25 @@ def DrbdWaitSync(nodes_ip, disks):
"""Wait until DRBDs have synchronized.
"""
def _helper(rd):
stats = rd.GetProcStatus()
if not (stats.is_connected or stats.is_in_resync):
raise utils.RetryAgain()
return stats
bdevs = _FindDisks(nodes_ip, disks)
min_resync = 100
alldone = True
for rd in bdevs:
stats = rd.GetProcStatus()
if not (stats.is_connected or stats.is_in_resync):
_Fail("DRBD device %s is not in sync: stats=%s", rd, stats)
try:
# poll each second for 15 seconds
stats = utils.Retry(_helper, 1, 15, args=[rd])
except utils.RetryTimeout:
stats = rd.GetProcStatus()
# last check
if not (stats.is_connected or stats.is_in_resync):
_Fail("DRBD device %s is not in sync: stats=%s", rd, stats)
alldone = alldone and (not stats.is_in_resync)
if stats.sync_percent is not None:
min_resync = min(min_resync, stats.sync_percent)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment