Reduce the chance of DRBD errors with stale primaries
This patch is a first step in reducing the chance of causing DRBD activation failures when the primary node has not-perfect data. This issue is more seen with DRBD8, which has an 'outdate' state (in which it can get more often). But it can (and before this patch, usually will) happen with both 7 and 8 in the case the primary has data to sync. The error comes from the fact that, before this patch, we activate the primary DRBD device and immediately (i.e. as soon as we can run another shell command) we try to make it primary. This might fail - since the primary knows it has some data to catch up to - but we ignored this error condition. The failure was visible later, in either md failing to activate over a read-only storage or by instance failing to start. The patch has two parts: one affecting bdev.py, which changes failures in BlockDev.Open() from returning False to raising errors.BlockDeviceError; noone (except a generic method inside bdev.py) checked this return value and we logged it but the master didn't know about it; now all classes raise errors from Open if they have a failure. The other part, affecting cmdlib.py, changes the activation sequence from: - activate on primary node as primary and secondary as secondary, in whatever order a function returns the nodes to the following: - activate all drives as secondaries, on both the primary and the secondary nodes of the instance - after that, on the primary node, re-activate the device stack as primary This is in order to give the chance to DRBD to connect and make the handshake. As noted in the comments, this just increases the chances of a handshake/connect, not fixing entirely the problem. However, it is a good first step and it passes all tests of starting with stale (either full or partial) primaries, with both drbd 7 and 8, and also passes a burnin. Note that the patch might make the device activation a little bit slower, but it is a reasonable trade-off. Reviewed-by: imsnah
Loading
Please register or sign in to comment