Commit cbfa4f0f authored by Michael Hanselmann's avatar Michael Hanselmann
Browse files

Merge branch 'devel-2.4'

* devel-2.4: (60 commits)
  Update news and bump version for 2.4.0 rc2
  Fix pylint warnings
  TestRapiInstanceRename use instance name
  Change the list formatting to a 'special' chars
  Add support for merging node groups
  Add option to rename groups on conflict
  Fix minor docstring typo
  Add QA rapi test for instance reinstall
  RAPI: remove required parameters for reinstall
  Fix HV/OS parameter validation on non-vm nodes
  NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes
  NodeQuery: don't query non-vm_capable nodes
  Fix LUClusterRepairDiskSizes and rpc result usage
  Fix RPC mismatch in blockdev_getsize[s]
  Remove superfluous redundant requirement
  Don't remove master_candidate flag from merged nodes
  Use a consistent ECID base
  listrunner: convert from getopt to optparse
  listrunner: fix agent usage
  Revert "Disable the cluster-merge tool for the moment"

	NEWS: Trivial
	lib/ Dropped _RSTATUS_TO_TEXT from master and used
    RSS_DESCRIPTION from devel-2.4 instead, adjusted users accordingly
Signed-off-by: default avatarMichael Hanselmann <>
Reviewed-by: default avatarIustin Pop <>
parents a1e43376 e41a1c0c
......@@ -512,6 +512,7 @@ python_tests = \
test/ \
test/ \
test/ \
test/ \
test/ \
test/ \
test/ \
......@@ -13,6 +13,122 @@ Version 2.5.0 beta1
parameter no longer defaults to ``loop`` and must be specified
Version 2.4.0 rc2
*(Released Mon, 21 Feb 2011)*
A number of bug fixes plus just a couple functionality changes.
On the user-visible side, the ``gnt-* list`` command output has changed
with respect to “special” field states. The current rc1 style of display
can be re-enabled by passing a new ``-v, --verbose`` flag, but in the
default output mode special fields are displayed as follows:
- offline field: ``*``
- unavailable/not applicable: ``-``
- data missing (RPC failure): ``?``
- unknown field: ``??``
Another user-visible change is the addition of ``--force-join`` to
``gnt-node add``.
As for bug fixes:
- ``tools/cluster-merge`` has seen many fixes and is now enabled again
- fixed regression in RAPI/instance reinstall where all parameters were
required (instead of optional)
- fixed ``gnt-cluster repair-disk-sizes``, was broken since Ganeti 2.2
- fixed iallocator usage (offline nodes were not considered offline)
- fixed ``gnt-node list`` with respect to non-vm_capable nodes
- fixed hypervisor and OS parameter validation with respect to
non-vm_capable nodes
- fixed ``gnt-cluster verify`` with respect to offline nodes (mostly
- fixed ``tools/listrunner`` with respect to agent-based usage
Version 2.4.0 rc1
*(Released Fri, 4 Feb 2011)*
Many changes and fixes since the beta1 release. While there were some
internal changes, the code has been mostly stabilised for the RC
Note: the dumb allocator was removed in this release, as it was not kept
up-to-date with the IAllocator protocol changes. It is recommended to
use the ``hail`` command from the ganeti-htools package.
Note: the 2.4 and up versions of Ganeti are not compatible with the
0.2.x branch of ganeti-htools. You need to upgrade to
ganeti-htools-0.3.0 (or later).
Regressions fixed from 2.3
- Fixed the ``gnt-cluster verify-disks`` command
- Made ``gnt-cluster verify-disks`` work in parallel (as opposed to
serially on nodes)
- Fixed disk adoption breakage
- Fixed wrong headers in instance listing for field aliases
Other bugs fixed
- Fixed corner case in KVM handling of NICs
- Fixed many cases of wrong handling of non-vm_capable nodes
- Fixed a bug where a missing instance symlink was not possible to
recreate with any ``gnt-*`` command (now ``gnt-instance
activate-disks`` does it)
- Fixed the volume group name as reported by ``gnt-cluster
- Increased timeouts for the import-export code, hopefully leading to
fewer aborts due network or instance timeouts
- Fixed bug in ``gnt-node list-storage``
- Fixed bug where not all daemons were started on cluster
initialisation, but only at the first watcher run
- Fixed many bugs in the OOB implementation
- Fixed watcher behaviour in presence of instances with offline
- Fixed instance list output for instances running on the wrong node
- a few fixes to the cluster-merge tool, but it still cannot merge
multi-node groups (currently it is not recommended to use this tool)
- Improved network configuration for the KVM hypervisor
- Added e1000 as a supported NIC for Xen-HVM
- Improved the lvmstrap tool to also be able to use partitions, as
opposed to full disks
- Improved speed of disk wiping (the cluster parameter
``prealloc_wipe_disks``, so that it has a low impact on the total time
of instance creations
- Added documentation for the OS parameters
- Changed ``gnt-instance deactivate-disks`` so that it can work if the
hypervisor is not responding
- Added display of blacklisted and hidden OS information in
``gnt-cluster info``
- Extended ``gnt-cluster verify`` to also validate hypervisor, backend,
NIC and node parameters, which might create problems with currently
invalid (but undetected) configuration files, but prevents validation
failures when unrelated parameters are modified
- Changed cluster initialisation to wait for the master daemon to become
- Expanded the RAPI interface:
- Added config redistribution resource
- Added activation/deactivation of instance disks
- Added export of console information
- Implemented log file reopening on SIGHUP, which allows using
logrotate(8) for the Ganeti log files
- Added a basic OOB helper script as an example
Version 2.4.0 beta1
......@@ -2,7 +2,7 @@
m4_define([gnt_version_major], [2])
m4_define([gnt_version_minor], [4])
m4_define([gnt_version_revision], [0])
m4_define([gnt_version_suffix], [~beta1])
m4_define([gnt_version_suffix], [~rc2])
gnt_version_major, gnt_version_minor,
......@@ -23,8 +23,8 @@ clusters into.
The usage of ``cluster-merge`` is as follows::
cluster-merge [--debug|--verbose] [--watcher-pause-period SECONDS] <cluster> \
cluster-merge [--debug|--verbose] [--watcher-pause-period SECONDS] \
[--groups [merge|rename]] <cluster> [<cluster...>]
You can provide multiple clusters. The tool will then go over every
cluster in serial and perform the steps to merge it into the invoking
......@@ -39,6 +39,15 @@ These options can be used to control the behaviour of the tool:
Define the period of time in seconds the watcher shall be disabled,
default is 1800 seconds (30 minutes).
This option controls how ``cluster-merge`` handles duplicate node
group names on the merging clusters. If ``merge`` is specified then
all node groups with the same name will be merged into one. If
``rename`` is specified, then conflicting node groups on the remove
clusters will have their cluster name appended to the group name. If
this option is not speicifed, then ``cluster-merge`` will refuse to
continue if it finds conflicting group names, otherwise it will
proceed as normal.
......@@ -296,9 +296,6 @@ failover:
determined by the serial number on the configuration and
highest job ID on the job queue)
- there is not even a single node having a newer
configuration file
- if we are not failing over (but just starting), the
quorum agrees that we are the designated master
......@@ -852,6 +852,37 @@ Body parameters:
:exclude: instance_name
Request information for connecting to instance's console.
Supports the following commands: ``GET``.
Returns a dictionary containing information about the instance's
console. Contained keys:
Instance name.
Console type, one of ``ssh``, ``vnc`` or ``msg``.
Message to display (``msg`` type only).
Host to connect to (``ssh`` and ``vnc`` only).
TCP port to connect to (``vnc`` only).
Username to use (``ssh`` only).
Command to execute on machine (``ssh`` only)
VNC display number (``vnc`` only).
......@@ -1492,7 +1492,7 @@ def _RecursiveAssembleBD(disk, owner, as_primary):
return result
def BlockdevAssemble(disk, owner, as_primary):
def BlockdevAssemble(disk, owner, as_primary, idx):
"""Activate a block device for an instance.
This is a wrapper over _RecursiveAssembleBD.
......@@ -1507,8 +1507,12 @@ def BlockdevAssemble(disk, owner, as_primary):
if isinstance(result, bdev.BlockDev):
# pylint: disable-msg=E1103
result = result.dev_path
if as_primary:
_SymlinkBlockDev(owner, result, idx)
except errors.BlockDeviceError, err:
_Fail("Error while assembling disk: %s", err, exc=True)
except OSError, err:
_Fail("Error while symlinking disk: %s", err, exc=True)
return result
......@@ -2250,7 +2254,7 @@ def FinalizeExport(instance, snap_disks):
config.set(constants.INISECT_EXP, 'timestamp', '%d' % int(time.time()))
config.set(constants.INISECT_EXP, 'source', instance.primary_node)
config.set(constants.INISECT_EXP, 'os', instance.os)
config.set(constants.INISECT_EXP, 'compression', 'gzip')
config.set(constants.INISECT_EXP, "compression", "none")
config.set(constants.INISECT_INS, 'name',
......@@ -104,6 +104,7 @@ __all__ = [
......@@ -886,6 +887,11 @@ NOSSH_KEYCHECK_OPT = cli_option("--no-ssh-key-check", dest="ssh_key_check",
default=True, action="store_false",
help="Disable SSH key fingerprint checking")
NODE_FORCE_JOIN_OPT = cli_option("--force-join", dest="force_join",
default=False, action="store_true",
help="Force the joining of a node,"
" needed when merging clusters")
MC_OPT = cli_option("-C", "--master-candidate", dest="master_candidate",
type="bool", default=None, metavar=_YORNO,
help="Set the master_candidate flag on the node")
......@@ -1171,14 +1177,6 @@ COMMON_CREATE_OPTS = [
constants.RS_UNKNOWN: "(unknown)",
constants.RS_NODATA: "(nodata)",
constants.RS_UNAVAIL: "(unavail)",
constants.RS_OFFLINE: "(offline)",
def _ParseArgs(argv, commands, aliases):
"""Parser for the command line arguments.
......@@ -1922,8 +1920,8 @@ def GenericMain(commands, override=None, aliases=None):
for key, val in override.iteritems():
setattr(options, key, val)
utils.SetupLogging(constants.LOG_COMMANDS, debug=options.debug,
stderr_logging=True, program=binary)
utils.SetupLogging(constants.LOG_COMMANDS, binary, debug=options.debug,
if old_cmdline:"run with arguments '%s'", old_cmdline)
......@@ -1937,6 +1935,11 @@ def GenericMain(commands, override=None, aliases=None):
result, err_msg = FormatError(err)
logging.exception("Error during command processing")
except KeyboardInterrupt:
result = constants.EXIT_FAILURE
ToStderr("Aborted. Note that if the operation created any jobs, they"
" might have been submitted and"
" will continue to run in the background.")
return result
......@@ -2380,17 +2383,20 @@ class _QueryColumnFormatter:
"""Callable class for formatting fields of a query.
def __init__(self, fn, status_fn):
def __init__(self, fn, status_fn, verbose):
"""Initializes this class.
@type fn: callable
@param fn: Formatting function
@type status_fn: callable
@param status_fn: Function to report fields' status
@type verbose: boolean
@param verbose: whether to use verbose field descriptions or not
self._fn = fn
self._status_fn = status_fn
self._verbose = verbose
def __call__(self, data):
"""Returns a field's string representation.
......@@ -2407,10 +2413,10 @@ class _QueryColumnFormatter:
assert value is None, \
"Found value %r for abnormal status %s" % (value, status)
return FormatResultError(status)
return FormatResultError(status, verbose=self._verbose)
def FormatResultError(status):
def FormatResultError(status, verbose=True):
"""Formats result status other than L{constants.RS_NORMAL}.
@param status: The result status
......@@ -2418,15 +2424,19 @@ def FormatResultError(status):
assert status != constants.RS_NORMAL, \
"FormatResultError called with status equals to constants.RS_NORMAL"
"FormatResultError called with status equal to constants.RS_NORMAL"
return _RSTATUS_TO_TEXT[status]
(verbose_text, normal_text) = constants.RSS_DESCRIPTION[status]
except KeyError:
raise NotImplementedError("Unknown status %s" % status)
if verbose:
return verbose_text
return normal_text
def FormatQueryResult(result, unit=None, format_override=None, separator=None,
header=False, verbose=False):
"""Formats data in L{objects.QueryResponse}.
@type result: L{objects.QueryResponse}
......@@ -2441,6 +2451,8 @@ def FormatQueryResult(result, unit=None, format_override=None, separator=None,
@param separator: String used to separate fields
@type header: bool
@param header: Whether to output header row
@type verbose: boolean
@param verbose: whether to use verbose field descriptions or not
if unit is None:
......@@ -2463,7 +2475,8 @@ def FormatQueryResult(result, unit=None, format_override=None, separator=None,
assert fdef.title and
(fn, align_right) = _GetColumnFormatter(fdef, format_override, unit)
_QueryColumnFormatter(fn, _RecordStatus),
_QueryColumnFormatter(fn, _RecordStatus,
table = FormatTable(, columns, header, separator)
......@@ -2512,7 +2525,7 @@ def _WarnUnknownFields(fdefs):
def GenericList(resource, fields, names, unit, separator, header, cl=None,
format_override=None, verbose=False):
"""Generic implementation for listing all items of a resource.
@param resource: One of L{constants.QR_OP_LUXI}
......@@ -2531,6 +2544,8 @@ def GenericList(resource, fields, names, unit, separator, header, cl=None,
@type format_override: dict
@param format_override: Dictionary for overriding field formatting functions,
indexed by field name, contents like L{_DEFAULT_FORMAT_QUERY}
@type verbose: boolean
@param verbose: whether to use verbose field descriptions or not
if cl is None:
......@@ -2545,7 +2560,8 @@ def GenericList(resource, fields, names, unit, separator, header, cl=None,
(status, data) = FormatQueryResult(response, unit=unit, separator=separator,
for line in data:
......@@ -503,7 +503,7 @@ def ListLocks(opts, args): # pylint: disable-msg=W0613
while True:
ret = GenericList(constants.QR_LOCK, selected_fields, None, None,
opts.separator, not opts.no_headers,
format_override=fmtoverride, verbose=opts.verbose)
if ret != constants.EXIT_SUCCESS:
return ret
......@@ -575,7 +575,8 @@ commands = {
"", "Test a few aspects of the job queue"),
"locks": (
ListLocks, ARGS_NONE,
"[--interval N]", "Show a list of locks in the master daemon"),
# Copyright (C) 2010 Google Inc.
# Copyright (C) 2010, 2011 Google Inc.
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
......@@ -101,7 +101,7 @@ def ListGroups(opts, args):
return GenericList(constants.QR_GROUP, desired_fields, args, None,
opts.separator, not opts.no_headers,
format_override=fmtoverride, verbose=opts.verbose)
def ListGroupFields(opts, args):
......@@ -121,7 +121,7 @@ def ListGroupFields(opts, args):
def SetGroupParams(opts, args):
"""Modifies a node group's parameters.
@param opts: the command line options seletect by the user
@param opts: the command line options selected by the user
@type args: list
@param args: should contain only one element, the node group name
......@@ -189,7 +189,7 @@ commands = {
"<group_name> <node>...", "Assign nodes to a group"),
"list": (
"Lists the node groups in the cluster. The available fields can be shown"
" using the \"list-fields\" command (see the man page for details)."
......@@ -179,6 +179,9 @@ def GenericManyOps(operation, fn):
cl = GetClient()
inames = _ExpandMultiNames(opts.multi_mode, args, client=cl)
if not inames:
if opts.multi_mode == _SHUTDOWN_CLUSTER:
ToStdout("Cluster is empty, no instances to shutdown")
return 0
raise errors.OpPrereqError("Selection filter does not match"
" any instances", errors.ECODE_INVAL)
multi_on = opts.multi_mode != _SHUTDOWN_INSTANCES or len(inames) > 1
......@@ -216,7 +219,7 @@ def ListInstances(opts, args):
return GenericList(constants.QR_INSTANCE, selected_fields, args, opts.units,
opts.separator, not opts.no_headers,
format_override=fmtoverride, verbose=opts.verbose)
def ListInstanceFields(opts, args):
......@@ -1363,7 +1366,7 @@ commands = {
"Show information on the specified instance(s)"),
'list': (
"Lists the instances and their status. The available fields can be shown"
" using the \"list-fields\" command (see the man page for details)."
......@@ -146,6 +146,8 @@ def _RunSetupSSH(options, nodes):
if not options.ssh_key_check:
if options.force_join:
......@@ -233,7 +235,7 @@ def ListNodes(opts, args):
return GenericList(constants.QR_NODE, selected_fields, args, opts.units,
opts.separator, not opts.no_headers,
format_override=fmtoverride, verbose=opts.verbose)
def ListNodeFields(opts, args):
......@@ -801,10 +803,11 @@ def SetNodeParams(opts, args):
commands = {
'add': (
AddNode, [ArgHost(min=1, max=1)],
"[-s ip] [--readd] [--no-ssh-key-check] [--no-node-setup] [--verbose] "
"[-s ip] [--readd] [--no-ssh-key-check] [--force-join]"
" [--no-node-setup] [--verbose]"
" <node_name>",
"Add a node to the cluster"),
'evacuate': (
......@@ -830,7 +833,7 @@ commands = {
"[<node_name>...]", "Show information about the node(s)"),
'list': (
"Lists the nodes in the cluster. The available fields can be shown using"
" the \"list-fields\" command (see the man page for details)."
......@@ -1554,7 +1554,7 @@ class LUClusterVerify(LogicalUnit):
for node, n_img in node_image.items():
if (not node == node_current):
if node != node_current:
test = instance in n_img.instances
_ErrorIf(test, self.EINSTANCEWRONGNODE, instance,
"instance should not run on node %s", node)
......@@ -1564,7 +1564,11 @@ class LUClusterVerify(LogicalUnit):
for idx, (success, status) in enumerate(disks)]
for nname, success, bdev_status, idx in diskdata:
_ErrorIf(instanceconfig.admin_up and not success,
# the 'ghost node' construction in Exec() ensures that we have a
# node here
snode = node_image[nname]
bad_snode = snode.ghost or snode.offline
_ErrorIf(instanceconfig.admin_up and not success and not bad_snode,
"couldn't retrieve status for disk/%s on %s: %s",
idx, nname, bdev_status)
......@@ -1623,6 +1627,12 @@ class LUClusterVerify(LogicalUnit):
# WARNING: we currently take into account down instances as well
# as up ones, considering that even if they're down someone
# might want to start them even in the event of a node failure.
if n_img.offline:
# we're skipping offline nodes from the N+1 warning, since
# most likely we don't have good memory infromation from them;
# we already list instances living on such nodes, and that's
# enough warning
for prinode, instances in n_img.sbp.items():
needed_mem = 0
for instance in instances:
......@@ -2291,8 +2301,8 @@ class LUClusterVerify(LogicalUnit):
self.ENODERPC, pnode, "instance %s, connection to"
" primary node failed", instance)
if pnode_img.offline:
_ErrorIf(pnode_img.offline, self.EINSTANCEBADNODE, instance,
"instance lives on offline node %s", inst_config.primary_node)
# If the instance is non-redundant we cannot survive losing its primary
# node, so we are not N+1 compliant. On the other hand we have no disk
......@@ -2341,7 +2351,7 @@ class LUClusterVerify(LogicalUnit):
# warn that the instance lives on offline nodes
_ErrorIf(inst_nodes_offline, self.EINSTANCEBADNODE, instance,
"instance lives on offline node(s) %s",
"instance has offline secondary node(s) %s",
# ... or ghost/non-vm_capable nodes
for node in inst_config.all_nodes:
......@@ -2575,16 +2585,18 @@ class LUClusterRepairDiskSizes(NoHooksLU):
newl = [v[2].Copy() for v in dskl]
for dsk in newl:
self.cfg.SetDiskID(dsk, node)
result = self.rpc.call_blockdev_getsizes(node, newl)
result = self.rpc.call_blockdev_getsize(node, newl)
if result.fail_msg:
self.LogWarning("Failure in blockdev_getsizes call to node"
self.LogWarning("Failure in blockdev_getsize call to node"
" %s, ignoring", node)
if len( != len(dskl):
if len(result.payload) != len(dskl):
logging.warning("Invalid result from node %s: len(dksl)=%d,"
" result.payload=%s", node, len(dskl), result.payload)
self.LogWarning("Invalid result from node %s, ignoring node results",
for ((instance, idx, disk), size) in zip(dskl,
for ((instance, idx, disk), size) in zip(dskl, result.payload):
if size is None:
self.LogWarning("Disk %d of instance %s did not return size"
" information, ignoring", idx,
......@@ -3652,7 +3664,10 @@ class _NodeQuery(_QueryBase):
# Gather data as requested
if query.NQ_LIVE in self.requested_data:
node_data = lu.rpc.call_node_info(nodenames, lu.cfg.GetVGName(),
# filter out non-vm_capable nodes
toquery_nodes = [name for name in nodenames if all_info[name].vm_capable]
node_data = lu.rpc.call_node_info(toquery_nodes, lu.cfg.GetVGName(),
live_data = dict((name, nresult.payload)
for (name, nresult) in node_data.items()
......@@ -3900,18 +3915,21 @@ class _InstanceQuery(_QueryBase):
"""Computes the list of instances and their attributes.
cluster = lu.cfg.GetClusterInfo()
all_info = lu.cfg.GetAllInstancesInfo()
instance_names = self._GetNames(lu, all_info.keys(), locking.LEVEL_INSTANCE)
instance_list = [all_info[name] for name in instance_names]
nodes = frozenset([inst.primary_node for inst in instance_list])
nodes = frozenset(itertools.chain(*(inst.all_nodes