Commit fdbd668d authored by Iustin Pop's avatar Iustin Pop
Browse files

Reduce the chance of DRBD errors with stale primaries

This patch is a first step in reducing the chance of causing DRBD
activation failures when the primary node has not-perfect data.

This issue is more seen with DRBD8, which has an 'outdate' state (in
which it can get more often). But it can (and before this patch, usually
will) happen with both 7 and 8 in the case the primary has data to sync.

The error comes from the fact that, before this patch, we activate the
primary DRBD device and immediately (i.e. as soon as we can run another
shell command) we try to make it primary. This might fail - since the
primary knows it has some data to catch up to - but we ignored this
error condition. The failure was visible later, in either md failing to
activate over a read-only storage or by instance failing to start.

The patch has two parts: one affecting bdev.py, which changes failures
in BlockDev.Open() from returning False to raising
errors.BlockDeviceError; noone (except a generic method inside bdev.py)
checked this return value and we logged it but the master didn't know
about it; now all classes raise errors from Open if they have a failure.

The other part, affecting cmdlib.py, changes the activation sequence
from:
  - activate on primary node as primary and secondary as secondary, in
    whatever order a function returns the nodes
to the following:
  - activate all drives as secondaries, on both the primary and the
    secondary nodes of the instance
  - after that, on the primary node, re-activate the device stack as
    primary

This is in order to give the chance to DRBD to connect and make the
handshake. As noted in the comments, this just increases the chances of
a handshake/connect, not fixing entirely the problem. However, it is a
good first step and it passes all tests of starting with stale (either
full or partial) primaries, with both drbd 7 and 8, and also passes a
burnin.

Note that the patch might make the device activation a little bit
slower, but it is a reasonable trade-off.

Reviewed-by: imsnah
parent b08d5a87
......@@ -122,7 +122,13 @@ class BlockDev(object):
status = status and child.Assemble()
if not status:
break
status = status and child.Open()
try:
child.Open()
except errors.BlockDeviceError:
for child in self._children:
child.Shutdown()
raise
if not status:
for child in self._children:
......@@ -502,7 +508,7 @@ class LogicalVolume(BlockDev):
This is a no-op for the LV device type.
"""
return True
pass
def Close(self):
"""Notifies that the device will no longer be used for I/O.
......@@ -510,7 +516,7 @@ class LogicalVolume(BlockDev):
This is a no-op for the LV device type.
"""
return True
pass
def Snapshot(self, size):
"""Create a snapshot copy of an lvm block device.
......@@ -954,7 +960,7 @@ class MDRaid1(BlockDev):
the 2.6.18's new array_state thing.
"""
return True
pass
def Close(self):
"""Notifies that the device will no longer be used for I/O.
......@@ -963,7 +969,7 @@ class MDRaid1(BlockDev):
`Open()`.
"""
return True
pass
class BaseDRBD(BlockDev):
......@@ -1456,9 +1462,9 @@ class DRBDev(BaseDRBD):
cmd.append("--do-what-I-say")
result = utils.RunCmd(cmd)
if result.failed:
logger.Error("Can't make drbd device primary: %s" % result.output)
return False
return True
msg = ("Can't make drbd device primary: %s" % result.output)
logger.Error(msg)
raise errors.BlockDeviceError(msg)
def Close(self):
"""Make the local state secondary.
......@@ -1471,8 +1477,10 @@ class DRBDev(BaseDRBD):
raise errors.BlockDeviceError("Can't find device")
result = utils.RunCmd(["drbdsetup", self.dev_path, "secondary"])
if result.failed:
logger.Error("Can't switch drbd device to secondary: %s" % result.output)
raise errors.BlockDeviceError("Can't switch drbd device to secondary")
msg = ("Can't switch drbd device to"
" secondary: %s" % result.output)
logger.Error(msg)
raise errors.BlockDeviceError(msg)
def SetSyncSpeed(self, kbytes):
"""Set the speed of the DRBD syncer.
......@@ -2068,9 +2076,9 @@ class DRBD8(BaseDRBD):
cmd.append("-o")
result = utils.RunCmd(cmd)
if result.failed:
logger.Error("Can't make drbd device primary: %s" % result.output)
return False
return True
msg = ("Can't make drbd device primary: %s" % result.output)
logger.Error(msg)
raise errors.BlockDeviceError(msg)
def Close(self):
"""Make the local state secondary.
......@@ -2083,8 +2091,10 @@ class DRBD8(BaseDRBD):
raise errors.BlockDeviceError("Can't find device")
result = utils.RunCmd(["drbdsetup", self.dev_path, "secondary"])
if result.failed:
logger.Error("Can't switch drbd device to secondary: %s" % result.output)
raise errors.BlockDeviceError("Can't switch drbd device to secondary")
msg = ("Can't switch drbd device to"
" secondary: %s" % result.output)
logger.Error(msg)
raise errors.BlockDeviceError(msg)
def Attach(self):
"""Find a DRBD device which matches our config and attach to it.
......
......@@ -1860,23 +1860,41 @@ def _AssembleInstanceDisks(instance, cfg, ignore_secondaries=False):
"""
device_info = []
disks_ok = True
iname = instance.name
# With the two passes mechanism we try to reduce the window of
# opportunity for the race condition of switching DRBD to primary
# before handshaking occured, but we do not eliminate it
# The proper fix would be to wait (with some limits) until the
# connection has been made and drbd transitions from WFConnection
# into any other network-connected state (Connected, SyncTarget,
# SyncSource, etc.)
# 1st pass, assemble on all nodes in secondary mode
for inst_disk in instance.disks:
master_result = None
for node, node_disk in inst_disk.ComputeNodeTree(instance.primary_node):
cfg.SetDiskID(node_disk, node)
is_primary = node == instance.primary_node
result = rpc.call_blockdev_assemble(node, node_disk,
instance.name, is_primary)
result = rpc.call_blockdev_assemble(node, node_disk, iname, False)
if not result:
logger.Error("could not prepare block device %s on node %s"
" (is_primary=%s)" %
(inst_disk.iv_name, node, is_primary))
if is_primary or not ignore_secondaries:
" (is_primary=False, pass=1)" % (inst_disk.iv_name, node))
if not ignore_secondaries:
disks_ok = False
if is_primary:
master_result = result
device_info.append((instance.primary_node, inst_disk.iv_name,
master_result))
# FIXME: race condition on drbd migration to primary
# 2nd pass, do only the primary node
for inst_disk in instance.disks:
for node, node_disk in inst_disk.ComputeNodeTree(instance.primary_node):
if node != instance.primary_node:
continue
cfg.SetDiskID(node_disk, node)
result = rpc.call_blockdev_assemble(node, node_disk, iname, True)
if not result:
logger.Error("could not prepare block device %s on node %s"
" (is_primary=True, pass=2)" % (inst_disk.iv_name, node))
disks_ok = False
device_info.append((instance.primary_node, inst_disk.iv_name, result))
# leave the disks configured for the primary node
# this is a workaround that would be fixed better by
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment