Commit 7c23902f authored by Hrvoje Ribicic's avatar Hrvoje Ribicic

Merge branch 'stable-2.14' into master

* stable-2.14
  Update the bounds of monad-control in the cabal template
  Fix RenewCrypto unittest
  Move common code out of DrbdAttachNet
  _EnsureSecondary is no longer needed
  Do not _GatherAndLinkBlockDevs in AcceptInstance
  Use _CloseInstanceDisks() helper during failover
  Add helpers for Open/Close during migration
  Introduce blockdev_open RPC
  Export the 'exclusive' parameter to ExtStorage
  Add the 'exclusive' parameter in bdev.Open()
  Extend ExtStorage with open/close scripts
  Refactor optional mechanism for ExtStorage scripts
  Use CPP macros to chose the appropriate NFData instance

* stable-2.13
  Break lines longer than 80 chars
  Be gentler to failures while removing SSH keys
  Change wording in documentation wrt configure-time paths
  Do not distribute files with configure-specific information
  LXC: Add udevadm settle invocation to prevent errors
  Fix haddock examples for extractJSONPath

* stable-2.12
  Upgrade codebase to support monad-control >=0.3.1.3 && <1.1
  Add macros for the version of monad-control
  Rename hs-lens-versions Makefile target to hs-pkg-versions
  Verify master status before retrying a socket
  Make LUClusterDestroy tell WConfD
  Add an RPC to prepare cluster destruction
  Support no-master state in ssconf
  WConfD: do not clean up own livelock
  Make WConfD have a livelock file as well
  Add a prefix for a WConfD livelock
  Detect if the own job file disappears
  Keep track of the number LUs executing
  Make job processes keep track of their job id
  Make LuxiD clean up its lock file
  QA: Fix CheckFileUnmodified to work with vcluster
  QA: Fix white-spaces in CheckFileUnmodified
  QA: Check that the cluster verify doesn't change the config
  QA: Allow to check that an operation doesn't change a file
  Use only shared configuration lock for ComputeDRBDMap
  Only assert properties of non-None objects
  If any IO error happens during job forking, retry
  Add a function for retrying `MonadError` computations
  Annotate every send/receive operation in Exec.hs
  Refactor `rethrowAnnotateIOError` and simplify its usage
  Query.Exec: Describe error if talking to job process fails
  Query.Exec: Log error when talking to job process fails
  Fix the generation of Makefile.ghc rules for *_hi
  Fix error handling for failed fork jobs
  If a forked job process malfunctions, kill it thoroughly
  Add function to run checked computations in `MonadError`
  Add job ID and process ID to log statements in Exec.hs
  Fix issues when generating 'lens' version definitions
  Signal to the job queue when the cluster is gone
  Only read config if necessary
  Always OutDate() the lu's config
  Outdate the config when waiting for locks
  Support outdating a config
  Allow unlocked reading of the config
  Also in tests, open fake config before editing
  Fix a few haddock comments

* stable-2.11
  Improve error handling when looking up instances
  Capture last exception
  Improve speed of Xen hypervisor unit tests
  Improve Xen instance state handling
  Renew crypto retries for non-master nodes
  Retries for the master's SSL cert renewal
  Unit tests for offline nodes
  De-duplicate testing code regarding pathutils
  Make LURenewCrypto handle unreachable nodes properly
  Error handling on failed SSL cert renewal for master
  Unit test for LURenewCrypto's valid case
  Mock support for pathutils
  Increase timeout of crypto token RPC
  Skip offline nodes in RENEW_CRYPTO jobs

* stable-2.10
  Make QA fail if KVM hotplugging fails
  Always preserve QA command output
  Don't lose stdout/stderr in AssertCommand
  qa_utils: Allow passing fail=None to AssertCommand
  qa_utils: Make AssertCommand return stdout/stderr as well
  Allow plain/DRBD conversions regardless of lack of disks
  Add support for ipolicy modifications to mock config
  Remove unused import
  Use an old way to instance NFData CollectorData
  MonD: force computation of state in stateful collectors
  Instance NFData CollectorData

Conflicts:
	src/Ganeti/Logging/WriterLog.hs
	src/Ganeti/THH/HsRPC.hs
	src/Ganeti/WConfd/Monad.hs
Resolution:
	All pragmas are the same; choose formatting.
Signed-off-by: default avatarHrvoje Ribicic <riba@google.com>
Reviewed-by: default avatarKlaus Aehlig <aehlig@google.com>
parents 18bd06cd 35f04560
......@@ -412,7 +412,7 @@ BUILT_EXAMPLES = \
doc/examples/systemd/ganeti-rapi.service \
doc/examples/systemd/ganeti-wconfd.service
dist_ifup_SCRIPTS = \
nodist_ifup_SCRIPTS = \
tools/kvm-ifup-os \
tools/xen-ifup-os
......@@ -1335,7 +1335,7 @@ Makefile.ghc: $(HS_MAKEFILE_GHC_SRCS) Makefile $(HASKELL_PACKAGE_VERSIONS_FILE)
# object listed in Makefile.ghc.
# e.g. src/hluxid.o : src/Ganeti/Daemon.hi
# => src/hluxid.o : src/Ganeti/Daemon.hi src/Ganeti/Daemon.o
sed -i -re 's/([^ ]+)\.hi$$/\1.hi \1.o/' $@
sed -i -r -e 's/([^ ]+)\.hi$$/\1.hi \1.o/' -e 's/([^ ]+)_hi$$/\1_hi \1_o/' $@
@include_makefile_ghc@
......@@ -1956,6 +1956,7 @@ python_test_support = \
test/py/cmdlib/testsupport/iallocator_mock.py \
test/py/cmdlib/testsupport/livelock_mock.py \
test/py/cmdlib/testsupport/netutils_mock.py \
test/py/cmdlib/testsupport/pathutils_mock.py \
test/py/cmdlib/testsupport/processor_mock.py \
test/py/cmdlib/testsupport/rpc_runner_mock.py \
test/py/cmdlib/testsupport/ssh_mock.py \
......
......@@ -164,7 +164,9 @@ There are several disk templates you can choose from:
.. note::
Disk templates marked with an asterisk require Ganeti to access the
file system. Ganeti will refuse to do so unless you whitelist the
relevant paths in :pyeval:`pathutils.FILE_STORAGE_PATHS_FILE`.
relevant paths in the file storage paths configuration which,
with default configure-time paths is located
in :pyeval:`pathutils.FILE_STORAGE_PATHS_FILE`.
The default paths used by Ganeti are:
......
......@@ -185,7 +185,9 @@ An “ExtStorage provider” will have to provide the following methods:
- Detach a disk from a given node
- SetInfo to a disk (add metadata)
- Verify its supported parameters
- Snapshot a disk (currently used during gnt-backup export)
- Snapshot a disk (optional)
- Open a disk (optional)
- Close a disk (optional)
The proposed ExtStorage interface borrows heavily from the OS
interface and follows a one-script-per-function approach. An ExtStorage
......@@ -199,16 +201,19 @@ provider is expected to provide the following scripts:
- ``setinfo``
- ``verify``
- ``snapshot`` (optional)
- ``open`` (optional)
- ``close`` (optional)
All scripts will be called with no arguments and get their input via
environment variables. A common set of variables will be exported for
all commands, and some of them might have extra ones.
all commands, and some commands might have extra variables.
``VOL_NAME``
The name of the volume. This is unique for Ganeti and it
uses it to refer to a specific volume inside the external storage.
``VOL_SIZE``
The volume's size in mebibytes.
Available only to the `create` and `grow` scripts.
``VOL_NEW_SIZE``
Available only to the `grow` script. It declares the
new size of the volume after grow (in mebibytes).
......@@ -221,11 +226,14 @@ all commands, and some of them might have extra ones.
``VOL_CNAME``
The human readable name of the disk (if any).
``VOL_SNAPSHOT_NAME``
The name of the volume's snapshot to be taken.
The name of the volume's snapshot.
Available only to the `snapshot` script.
``VOL_SNAPSHOT_SIZE``
The size of the volume's snapshot to be taken.
The size of the volume's snapshot.
Available only to the `snapshot` script.
``VOL_OPEN_EXCLUSIVE``
Whether the volume will be accessed exclusively or not.
Available only to the `open` script.
All scripts except `attach` should return 0 on success and non-zero on
error, accompanied by an appropriate error message on stderr. The
......@@ -233,9 +241,14 @@ error, accompanied by an appropriate error message on stderr. The
the block device's full path, after it has been successfully attached to
the host node. On error it should return non-zero.
To keep backwards compatibility we let the ``snapshot`` script be
optional. If present then the provider will support instance backup
export as well.
The ``snapshot``, ``open`` and ``close`` scripts are introduced after
the first implementation of the ExtStorage Interface. To keep backwards
compatibility with the first implementation, we make these scripts
optional.
The ``snapshot`` script, if present, will be used for instance backup
export. The ``open`` script makes the device ready for I/O. The ``close``
script disables the I/O on the device.
Implementation
--------------
......@@ -243,7 +256,8 @@ Implementation
To support the ExtStorage interface, we will introduce a new disk
template called `ext`. This template will implement the existing Ganeti
disk interface in `lib/bdev.py` (create, remove, attach, assemble,
shutdown, grow, setinfo), and will simultaneously pass control to the
shutdown, grow, setinfo, open, close),
and will simultaneously pass control to the
external scripts to actually handle the above actions. The `ext` disk
template will act as a translation layer between the current Ganeti disk
interface and the ExtStorage providers.
......
......@@ -1608,18 +1608,20 @@ def RemoveNodeSshKey(node_uuid, node_name,
should be cleared on the node whose keys are removed
@type clear_public_keys: boolean
@param clear_public_keys: whether to clear the node's C{ganeti_pub_key} file
@rtype: list of string
@returns: list of feedback messages
"""
result_msgs = []
# Make sure at least one of these flags is true.
assert (from_authorized_keys or from_public_keys or clear_authorized_keys
or clear_public_keys)
if not (from_authorized_keys or from_public_keys or clear_authorized_keys
or clear_public_keys):
result_msgs.append("No removal from any key file was requested.")
if not ssconf_store:
ssconf_store = ssconf.SimpleStore()
if not (from_authorized_keys or from_public_keys or clear_authorized_keys):
raise errors.SshUpdateError("No removal from any key file was requested.")
master_node = ssconf_store.GetMasterNode()
if from_authorized_keys or from_public_keys:
......@@ -1675,16 +1677,24 @@ def RemoveNodeSshKey(node_uuid, node_name,
" node '%s', map: %s." %
(node, ssh_port_map))
if node in potential_master_candidates:
run_cmd_fn(cluster_name, node, pathutils.SSH_UPDATE,
ssh_port, pot_mc_data,
debug=False, verbose=False, use_cluster_key=False,
ask_key=False, strict_host_check=False)
else:
if from_authorized_keys:
try:
run_cmd_fn(cluster_name, node, pathutils.SSH_UPDATE,
ssh_port, base_data,
ssh_port, pot_mc_data,
debug=False, verbose=False, use_cluster_key=False,
ask_key=False, strict_host_check=False)
except errors.OpExecError as e:
result_msgs.append("Warning: the SSH setup of node '%s' could not"
" be adjusted." % node)
else:
if from_authorized_keys:
try:
run_cmd_fn(cluster_name, node, pathutils.SSH_UPDATE,
ssh_port, base_data,
debug=False, verbose=False, use_cluster_key=False,
ask_key=False, strict_host_check=False)
except errors.OpExecError as e:
result_msgs.append("Warning: the SSH setup of node '%s' could"
" not be adjusted." % node)
if clear_authorized_keys or from_public_keys or clear_public_keys:
data = {}
......@@ -1727,10 +1737,12 @@ def RemoveNodeSshKey(node_uuid, node_name,
ssh_port, data,
debug=False, verbose=False, use_cluster_key=False,
ask_key=False, strict_host_check=False)
except errors.OpExecError, e:
logging.info("Removing SSH keys from node '%s' failed. This can happen"
" when the node is already unreachable. Error: %s",
node_name, e)
except errors.OpExecError as e:
result_msgs.append("Removing SSH keys from node '%s' failed. This can"
" happen when the node is already unreachable."
" Error: %s" % (node_name, e))
return result_msgs
def _GenerateNodeSshKey(node_uuid, node_name, ssh_port_map,
......@@ -2763,21 +2775,10 @@ def AcceptInstance(instance, info, target):
@param target: target host (usually ip), on this node
"""
# TODO: why is this required only for DTS_EXT_MIRROR?
if utils.AnyDiskOfType(instance.disks_info, constants.DTS_EXT_MIRROR):
# Create the symlinks, as the disks are not active
# in any way
try:
_GatherAndLinkBlockDevs(instance)
except errors.BlockDeviceError, err:
_Fail("Block device error: %s", err, exc=True)
hyper = hypervisor.GetHypervisor(instance.hypervisor)
try:
hyper.AcceptInstance(instance, info, target)
except errors.HypervisorError, err:
if utils.AnyDiskOfType(instance.disks_info, constants.DTS_EXT_MIRROR):
_RemoveBlockDevLinks(instance.name, instance.disks_info)
_Fail("Failed to accept instance: %s", err, exc=True)
......@@ -4524,12 +4525,35 @@ def BlockdevClose(instance_name, disks):
except errors.BlockDeviceError, err:
msg.append(str(err))
if msg:
_Fail("Can't make devices secondary: %s", ",".join(msg))
_Fail("Can't close devices: %s", ",".join(msg))
else:
if instance_name:
_RemoveBlockDevLinks(instance_name, disks)
def BlockdevOpen(instance_name, disks, exclusive):
"""Opens the given block devices.
"""
bdevs = []
for cf in disks:
rd = _RecursiveFindBD(cf)
if rd is None:
_Fail("Can't find device %s", cf)
bdevs.append(rd)
msg = []
for idx, rd in enumerate(bdevs):
try:
rd.Open(exclusive=exclusive)
_SymlinkBlockDev(instance_name, rd.dev_path, idx)
except errors.BlockDeviceError, err:
msg.append(str(err))
if msg:
_Fail("Can't open devices: %s", ",".join(msg))
def ValidateHVParams(hvname, hvparams):
"""Validates the given hypervisor parameters.
......@@ -5122,18 +5146,12 @@ def DrbdDisconnectNet(disks):
err, exc=True)
def DrbdAttachNet(disks, instance_name, multimaster):
def DrbdAttachNet(disks, multimaster):
"""Attaches the network on a list of drbd devices.
"""
bdevs = _FindDisks(disks)
if multimaster:
for idx, rd in enumerate(bdevs):
try:
_SymlinkBlockDev(instance_name, rd.dev_path, idx)
except EnvironmentError, err:
_Fail("Can't create symlink: %s", err)
# reconnect disks, switch to new master configuration and if
# needed primary mode
for rd in bdevs:
......@@ -5187,14 +5205,6 @@ def DrbdAttachNet(disks, instance_name, multimaster):
except utils.RetryTimeout:
_Fail("Timeout in disk reconnecting")
if multimaster:
# change to primary mode
for rd in bdevs:
try:
rd.Open()
except errors.BlockDeviceError, err:
_Fail("Can't change to primary mode: %s", err)
def DrbdWaitSync(disks):
"""Wait until DRBDs have synchronized.
......
......@@ -108,6 +108,7 @@ class LUClusterRenewCrypto(NoHooksLU):
"""
_MAX_NUM_RETRIES = 3
REQ_BGL = False
def ExpandNames(self):
......@@ -128,7 +129,7 @@ class LUClusterRenewCrypto(NoHooksLU):
self._ssh_renewal_suppressed = \
not self.cfg.GetClusterInfo().modify_ssh_setup and self.op.ssh_keys
def _RenewNodeSslCertificates(self):
def _RenewNodeSslCertificates(self, feedback_fn):
"""Renews the nodes' SSL certificates.
Note that most of this operation is done in gnt_cluster.py, this LU only
......@@ -149,15 +150,61 @@ class LUClusterRenewCrypto(NoHooksLU):
except IOError:
logging.info("No old certificate available.")
new_master_digest = _UpdateMasterClientCert(self, self.cfg, master_uuid)
last_exception = None
for _ in range(self._MAX_NUM_RETRIES):
try:
# Technically it should not be necessary to set the cert
# paths. However, due to a bug in the mock library, we
# have to do this to be able to test the function properly.
_UpdateMasterClientCert(
self, self.cfg, master_uuid,
client_cert=pathutils.NODED_CLIENT_CERT_FILE,
client_cert_tmp=pathutils.NODED_CLIENT_CERT_FILE_TMP)
break
except errors.OpExecError as e:
last_exception = e
else:
if last_exception:
feedback_fn("Could not renew the master's client SSL certificate."
" Cleaning up. Error: %s." % last_exception)
# Cleaning up temporary certificates
self.cfg.RemoveNodeFromCandidateCerts("%s-SERVER" % master_uuid)
self.cfg.RemoveNodeFromCandidateCerts("%s-OLDMASTER" % master_uuid)
try:
utils.RemoveFile(pathutils.NODED_CLIENT_CERT_FILE_TMP)
except IOError:
pass
return
self.cfg.AddNodeToCandidateCerts(master_uuid, new_master_digest)
node_errors = {}
nodes = self.cfg.GetAllNodesInfo()
for (node_uuid, node_info) in nodes.items():
if node_info.offline:
logging.info("* Skipping offline node %s", node_info.name)
continue
if node_uuid != master_uuid:
new_digest = CreateNewClientCert(self, node_uuid)
if node_info.master_candidate:
self.cfg.AddNodeToCandidateCerts(node_uuid, new_digest)
last_exception = None
for _ in range(self._MAX_NUM_RETRIES):
try:
new_digest = CreateNewClientCert(self, node_uuid)
if node_info.master_candidate:
self.cfg.AddNodeToCandidateCerts(node_uuid,
new_digest)
break
except errors.OpExecError as e:
last_exception = e
else:
if last_exception:
node_errors[node_uuid] = last_exception
if node_errors:
msg = ("Some nodes' SSL client certificates could not be renewed."
" Please make sure those nodes are reachable and rerun"
" the operation. The affected nodes and their errors are:\n")
for uuid, e in node_errors.items():
msg += "Node %s: %s\n" % (uuid, e)
feedback_fn(msg)
self.cfg.RemoveNodeFromCandidateCerts("%s-SERVER" % master_uuid)
self.cfg.RemoveNodeFromCandidateCerts("%s-OLDMASTER" % master_uuid)
......@@ -184,8 +231,10 @@ class LUClusterRenewCrypto(NoHooksLU):
def Exec(self, feedback_fn):
if self.op.node_certificates:
self._RenewNodeSslCertificates()
feedback_fn("Renewing Node SSL certificates")
self._RenewNodeSslCertificates(feedback_fn)
if self.op.ssh_keys and not self._ssh_renewal_suppressed:
feedback_fn("Renewing SSH keys")
self._RenewSshKeys()
elif self._ssh_renewal_suppressed:
feedback_fn("Cannot renew SSH keys if the cluster is configured to not"
......@@ -252,6 +301,11 @@ class LUClusterDestroy(LogicalUnit):
HPATH = "cluster-destroy"
HTYPE = constants.HTYPE_CLUSTER
# Read by the job queue to detect when the cluster is gone and job files will
# never be available.
# FIXME: This variable should be removed together with the Python job queue.
clusterHasBeenDestroyed = False
def BuildHooksEnv(self):
"""Build hooks env.
......@@ -300,6 +354,12 @@ class LUClusterDestroy(LogicalUnit):
result = self.rpc.call_node_deactivate_master_ip(master_params.uuid,
master_params, ems)
result.Warn("Error disabling the master IP address", self.LogWarning)
self.wconfd.Client().PrepareClusterDestruction(self.wconfdcontext)
# signal to the job queue that the cluster is gone
LUClusterDestroy.clusterHasBeenDestroyed = True
return master_params.uuid
......
......@@ -1513,7 +1513,7 @@ class LUClusterVerifyGroup(LogicalUnit, _VerifyErrors):
if test:
nimg.hyp_fail = True
else:
nimg.instances = [inst.uuid for (_, inst) in
nimg.instances = [uuid for (uuid, _) in
self.cfg.GetMultiInstanceInfoByName(idata)]
def _UpdateNodeInfo(self, ninfo, nresult, nimg, vg_name):
......
......@@ -530,17 +530,35 @@ class TLMigrateInstance(Tasklet):
self.feedback_fn(" - progress: %.1f%%" % min_percent)
time.sleep(2)
def _EnsureSecondary(self, node_uuid):
"""Demote a node to secondary.
def _OpenInstanceDisks(self, node_uuid, exclusive):
"""Open instance disks.
"""
self.feedback_fn("* switching node %s to secondary mode" %
if exclusive:
mode = "in exclusive mode"
else:
mode = "in shared mode"
self.feedback_fn("* opening instance disks on node %s %s" %
(self.cfg.GetNodeName(node_uuid), mode))
disks = self.cfg.GetInstanceDisks(self.instance.uuid)
result = self.rpc.call_blockdev_open(node_uuid, self.instance.name,
(disks, self.instance), exclusive)
result.Raise("Cannot open disks on node %s" %
self.cfg.GetNodeName(node_uuid))
def _CloseInstanceDisks(self, node_uuid):
"""Close instance disks.
"""
self.feedback_fn("* closing instance disks on node %s" %
self.cfg.GetNodeName(node_uuid))
disks = self.cfg.GetInstanceDisks(self.instance.uuid)
result = self.rpc.call_blockdev_close(node_uuid, self.instance.name,
(disks, self.instance))
result.Raise("Cannot change disk to secondary on node %s" %
result.Raise("Cannot close instance disks on node %s" %
self.cfg.GetNodeName(node_uuid))
def _GoStandalone(self):
......@@ -567,7 +585,7 @@ class TLMigrateInstance(Tasklet):
disks = self.cfg.GetInstanceDisks(self.instance.uuid)
result = self.rpc.call_drbd_attach_net(self.all_node_uuids,
(disks, self.instance),
self.instance.name, multimaster)
multimaster)
for node_uuid, nres in result.items():
nres.Raise("Cannot change disks config on node %s" %
self.cfg.GetNodeName(node_uuid))
......@@ -628,8 +646,10 @@ class TLMigrateInstance(Tasklet):
demoted_node_uuid = self.target_node_uuid
disks = self.cfg.GetInstanceDisks(self.instance.uuid)
self._CloseInstanceDisks(demoted_node_uuid)
if utils.AnyDiskOfType(disks, constants.DTS_INT_MIRROR):
self._EnsureSecondary(demoted_node_uuid)
try:
self._WaitUntilSync()
except errors.OpExecError:
......@@ -639,6 +659,8 @@ class TLMigrateInstance(Tasklet):
self._GoStandalone()
self._GoReconnect(False)
self._WaitUntilSync()
elif utils.AnyDiskOfType(disks, constants.DTS_EXT_MIRROR):
self._OpenInstanceDisks(self.instance.primary_node, True)
self.feedback_fn("* done")
......@@ -649,11 +671,13 @@ class TLMigrateInstance(Tasklet):
disks = self.cfg.GetInstanceDisks(self.instance.uuid)
self._CloseInstanceDisks(self.target_node_uuid)
if utils.AllDiskOfType(disks, constants.DTS_EXT_MIRROR):
self._OpenInstanceDisks(self.source_node_uuid, True)
return
try:
self._EnsureSecondary(self.target_node_uuid)
self._GoStandalone()
self._GoReconnect(False)
self._WaitUntilSync()
......@@ -765,13 +789,17 @@ class TLMigrateInstance(Tasklet):
disks = self.cfg.GetInstanceDisks(self.instance.uuid)
if not utils.AnyDiskOfType(disks, constants.DTS_EXT_MIRROR):
self._CloseInstanceDisks(self.target_node_uuid)
if utils.AnyDiskOfType(disks, constants.DTS_INT_MIRROR):
# Then switch the disks to master/master mode
self._EnsureSecondary(self.target_node_uuid)
self._GoStandalone()
self._GoReconnect(True)
self._WaitUntilSync()
self._OpenInstanceDisks(self.source_node_uuid, False)
self._OpenInstanceDisks(self.target_node_uuid, False)
self.feedback_fn("* preparing %s to accept the instance" %
self.cfg.GetNodeName(self.target_node_uuid))
result = self.rpc.call_accept_instance(self.target_node_uuid,
......@@ -858,12 +886,15 @@ class TLMigrateInstance(Tasklet):
raise errors.OpExecError("Could not finalize instance migration: %s" %
msg)
self._CloseInstanceDisks(self.source_node_uuid)
if utils.AnyDiskOfType(disks, constants.DTS_INT_MIRROR):
self._EnsureSecondary(self.source_node_uuid)
self._WaitUntilSync()
self._GoStandalone()
self._GoReconnect(False)
self._WaitUntilSync()
elif utils.AnyDiskOfType(disks, constants.DTS_EXT_MIRROR):
self._OpenInstanceDisks(self.target_node_uuid, True)
# If the instance's disk template is `rbd' or `ext' and there was a
# successful migration, unmap the device from the source node.
......@@ -949,6 +980,9 @@ class TLMigrateInstance(Tasklet):
(self.instance.name,
self.cfg.GetNodeName(source_node_uuid), msg))
if self.instance.disk_template in constants.DTS_EXT_MIRROR:
self._CloseInstanceDisks(source_node_uuid)
self.feedback_fn("* deactivating the instance's disks on source node")
if not ShutdownInstanceDisks(self.lu, self.instance, ignore_primary=True):
raise errors.OpExecError("Can't shut down the instance's disks")
......
......@@ -1503,7 +1503,6 @@ class LUInstanceSetParams(LogicalUnit):
assert len(secondary_nodes) == 1
assert utils.AnyDiskOfType(disks, [constants.DT_DRBD8])
snode_uuid = secondary_nodes[0]
feedback_fn("Converting disk template from 'drbd' to 'plain'")
old_disks = AnnotateDiskParams(self.instance, disks, self.cfg)
......@@ -1537,7 +1536,7 @@ class LUInstanceSetParams(LogicalUnit):
feedback_fn("Removing volumes on the secondary node...")
RemoveDisks(self, self.instance, disks=old_disks,
target_node_uuid=snode_uuid)
target_node_uuid=secondary_nodes[0])
feedback_fn("Removing unneeded volumes on the primary node...")
meta_disks = []
......
......@@ -2961,7 +2961,6 @@ class TLReplaceDisks(Tasklet):
result = self.rpc.call_drbd_attach_net([self.instance.primary_node,
self.new_node_uuid],
(inst_disks, self.instance),
self.instance.name,
False)
for to_node, to_result in result.items():
msg = to_result.fail_msg
......
......@@ -875,9 +875,13 @@ class LUNodeSetParams(LogicalUnit):
False, # currently, all nodes are potential master candidates
False, # do not clear node's 'authorized_keys'
False) # do not clear node's 'ganeti_pub_keys'
if not ssh_result[master_node].fail_msg:
for message in ssh_result[master_node].payload:
feedback_fn(message)
ssh_result[master_node].Raise(
"Could not adjust the SSH setup after demoting node '%s'"
" (UUID: %s)." % (node.name, node.uuid))
if self.new_role == self._ROLE_CANDIDATE:
ssh_result = self.rpc.call_node_ssh_key_add(
[master_node], node.uuid, node.name,
......
......@@ -1241,7 +1241,7 @@ class ConfigWriter(object):
self._ConfigData().cluster.highest_used_port = port
return port
@ConfigSync()
@ConfigSync(shared=1)
def ComputeDRBDMap(self):
"""Compute the used DRBD minor/nodes.
......@@ -2113,7 +2113,11 @@ class ConfigWriter(object):
result = []
for name in inst_names:
instance = self._UnlockedGetInstanceInfoByName(name)
result.append((instance.uuid, instance))
if instance:
result.append((instance.uuid, instance))
else:
raise errors.ConfigurationError("Instance data of instance '%s'"
" not found." % name)
return result
@ConfigSync(shared=1)
......
......@@ -154,7 +154,7 @@ def _ParseInstanceList(lines, include_node):
return result
def _GetAllInstanceList(fn, include_node, _timeout=5):
def _GetAllInstanceList(fn, include_node, delays, timeout):
"""Return the list of instances including running and shutdown.
See L{_RunInstanceList} and L{_ParseInstanceList} for parameter details.
......@@ -162,7 +162,7 @@ def _GetAllInstanceList(fn, include_node, _timeout=5):
"""
instance_list_errors = []
try:
lines = utils.Retry(_RunInstanceList, (0.3, 1.5, 1.0), _timeout,
lines = utils.Retry(_RunInstanceList, delays, timeout,
args=(fn, instance_list_errors))
except utils.RetryTimeout:
if instance_list_errors:
......@@ -182,7 +182,7 @@ def _IsInstanceRunning(instance_info):
"""Determine whether an instance is running.
An instance is running if it is in the following Xen states:
running, blocked, or paused.
running, blocked, paused, or dying (about to be destroyed / shutdown).
For some strange reason, Xen once printed 'rb----' which does not make any
sense because an instance cannot be both running and blocked. Fortunately,
......@@ -193,6 +193,9 @@ def _IsInstanceRunning(instance_info):
to be scheduled to run.
http://old-list-archives.xenproject.org/xen-users/2007-06/msg00849.html
A dying instance is about to be removed, but it is still consuming resources,
and counts as running.
@type instance_info: string
@param instance_info: Information about instance, as supplied by Xen.
@rtype: bool
......@@ -202,15 +205,51 @@ def _IsInstanceRunning(instance_info):
return instance_info == "r-----" \
or instance_info == "rb----" \
or instance_info == "-b----" \
or instance_info == "--p---" \
or instance_info == "-----d" \
or instance_info == "------"
def _IsInstanceShutdown(instance_info):
return instance_info == "---s--"
"""Determine whether the instance is shutdown.
An instance is shutdown when a user shuts it down from within, and we do not
remove domains to be able to detect that.
The dying state has been added as a precaution, as Xen's status reporting is
weird.
"""
return instance_info == "---s--" \
or instance_info == "---s-d"
def _IgnorePaused(instance_info):
"""Removes information about whether a Xen state is paused from the state.
As it turns out, an instance can be reported as paused in almost any
condition. Paused instances can be paused, running instances can be paused for
scheduling, and any other condition can appear to be paused as a result of
races or improbable conditions in Xen's status reporting.
As we do not use Xen's pause commands in any way at the time, we can simply
ignore the paused field and save ourselves a lot of trouble.
Should we ever use the pause commands, several samples would be needed before
we could confirm the domain as paused.
"""
return instance_info.replace('p', '-')
def _XenToHypervisorInstanceState(instance_info):
"""Maps Xen states to hypervisor states.
@type instance_info: string
@param instance_info: Information about instance, as supplied by Xen.
@rtype: L{hv_base.HvInstanceState}
"""
instance_info = _IgnorePaused(instance_info)
if _IsInstanceRunning(instance_info):
return hv_base.HvInstanceState.RUNNING
elif _IsInstanceShutdown(instance_info):
......@@ -221,23 +260,23 @@ def _XenToHypervisorInstanceState(instance_info):
instance_info)
def _GetRunningInstanceList(fn, include_node, _timeout=5):
def _GetRunningInstanceList(fn, include_node, delays, timeout):
"""Return the list of running instances.
See L{_GetAllInstanceList} for parameter details.
"""
instances = _GetAllInstanceList(fn, include_node, _timeout)
instances = _GetAllInstanceList(fn, include_node, delays, timeout)
return [i for i in instances if hv_base.HvInstanceState.IsRunning(i[4])]
def _GetShutdownInstanceList(fn, include_node, _timeout=5):
def _GetShutdownInstanceList(fn, include_node, delays, timeout):
"""Return the list of shutdown instances.
See L{_GetAllInstanceList} for parameter details.
"""
instances = _GetAllInstanceList(fn, include_node, _timeout)
instances = _GetAllInstanceList(fn, include_node, delays, timeout)