Commit 0cf5e7f5 authored by Iustin Pop's avatar Iustin Pop
Browse files

Improve cluster verify with hypervisor errors



In case the hypervisor has issues on one node, currently
backend.VerifyNode will exit via an exception (two exit paths possible,
one via HypervisorError from hypervisor.Verify(), and one via RPCFail
from GetInstanceList). This is bad as it invalidates all other checks of
that node.

This patch catches these two errors and allows the rest of the
VerifyNode function to run. This leads to a more complete verify cluster
run, for example now only real missing LVs are reported, not all of
them.

The cluster verify is not perfect as it will skip some tests even if it
has data, but this will require a more complete rewrite (see issue 90).

Also, the patch fixes and improves some error messages in cmdlib.
Signed-off-by: default avatarIustin Pop <iustin@google.com>
Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
parent c63355f2
......@@ -480,7 +480,11 @@ def VerifyNode(what, cluster_name):
if constants.NV_HYPERVISOR in what:
result[constants.NV_HYPERVISOR] = tmp = {}
for hv_name in what[constants.NV_HYPERVISOR]:
tmp[hv_name] = hypervisor.GetHypervisor(hv_name).Verify()
try:
val = hypervisor.GetHypervisor(hv_name).Verify()
except errors.HypervisorError, err:
val = "Error while checking hypervisor: %s" % str(err)
tmp[hv_name] = val
if constants.NV_FILELIST in what:
result[constants.NV_FILELIST] = utils.FingerprintFiles(
......@@ -523,8 +527,12 @@ def VerifyNode(what, cluster_name):
result[constants.NV_LVLIST] = GetVolumeList(what[constants.NV_LVLIST])
if constants.NV_INSTANCELIST in what:
result[constants.NV_INSTANCELIST] = GetInstanceList(
what[constants.NV_INSTANCELIST])
# GetInstanceList can fail
try:
val = GetInstanceList(what[constants.NV_INSTANCELIST])
except RPCFail, err:
val = str(err)
result[constants.NV_INSTANCELIST] = val
if constants.NV_VGLIST in what:
result[constants.NV_VGLIST] = utils.ListVolumeGroups()
......
......@@ -1439,7 +1439,8 @@ class LUVerifyCluster(LogicalUnit):
idata = nresult.get(constants.NV_INSTANCELIST, None)
test = not isinstance(idata, list)
_ErrorIf(test, self.ENODEHV, node,
"rpc call to node failed (instancelist)")
"rpc call to node failed (instancelist): %s",
utils.SafeEncode(str(idata)))
if test:
continue
......@@ -1544,7 +1545,7 @@ class LUVerifyCluster(LogicalUnit):
_ErrorIf(snode not in node_info and snode not in n_offline,
self.ENODERPC, snode,
"instance %s, connection to secondary node"
"failed", instance)
" failed", instance)
if snode in node_info:
node_info[snode]['sinst'].append(instance)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment