Commit 3330e4d2 authored by Klaus Aehlig's avatar Klaus Aehlig

Merge branch 'stable-2.13' into stable-2.14

* stable-2.13
  Add a test verifying --ignore-soft-errors in hail
  Pass on options in nodeEvacInstance for ChangePrimary
  HTools: Pass --ignore-soft-errors all the way through
  Make tags a soft error
  Mention tags in --restricted-migration man page entry
  Ignore spurious pylint warnings
  Fix output in case of ExtStorage attach error
  Fix gnt-storage diagnose in case of missing files
  Revision bump to 2.13.0~beta1
  Prepare the NEWS file for 2.13.0~beta1
  Update the rst files for 2.13
  Add a note about downgrades wrt to SSH key handling
  Add check for >1 disk used with the LXC hypervisor
  Update years in copyright disclaimer for LXC files
  Use DSA SSH keys only

* stable-2.12
  Revision bump to 2.12.1
  Prepare the NEWS file for the new 2.12.1 release

Conflicts:
	NEWS
	src/Ganeti/HTools/Cluster.hs
	src/Ganeti/HTools/Node.hs
	src/Ganeti/HTools/Program/Hspace.hs
	test/hs/Test/Ganeti/HTools/Backend/Text.hs

Resolution:
	NEWS: take all additions; Rest: give
	preference to stable-2.14 version

Semantic Conflict:
	Ignore suffix bump in configure.ac
Signed-off-by: default avatarKlaus Aehlig <aehlig@google.com>
Reviewed-by: default avatarAaron Karper <akarper@google.com>
parents 4379403e fbe43b73
......@@ -641,6 +641,7 @@ docinput = \
doc/design-2.10.rst \
doc/design-2.11.rst \
doc/design-2.12.rst \
doc/design-2.13.rst \
doc/design-autorepair.rst \
doc/design-bulk-create.rst \
doc/design-ceph-ganeti-support.rst \
......@@ -1648,6 +1649,7 @@ TEST_FILES = \
test/data/htools/hail-alloc-invalid-network.json \
test/data/htools/hail-alloc-invalid-twodisks.json \
test/data/htools/hail-alloc-restricted-network.json \
test/data/htools/hail-alloc-plain-tags.json \
test/data/htools/hail-alloc-spindles.json \
test/data/htools/hail-alloc-twodisks.json \
test/data/htools/hail-change-group.json \
......
......@@ -26,10 +26,10 @@ New dependencies
- Building the Haskell part of Ganeti now requires Cabal and cabal-install.
Version 2.13.0
--------------
Version 2.13.0 beta1
--------------------
*(unreleased)*
*(Released Wed, 14 Jan 2015)*
Incompatible/important changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -50,6 +50,10 @@ Incompatible/important changes
--ignore-hvversions to restore the old behavior of only warning.
- Node tags starting with htools:migration: or htools:allowmigration: now have
a special meaning to htools(1). See hbal(1) for details.
- The LXC hypervisor code has been repaired and improved. Instances cannot be
migrated and cannot have more than one disk, but should otherwise work as with
other hypervisors. OS script changes should not be necessary. LXC version
1.0.0 or higher required.
New features
~~~~~~~~~~~~
......@@ -77,6 +81,41 @@ New dependencies
- The formerly optional regex-pcre is now an unconditional dependency because
the new job filter rules have regular expressions as a core feature.
Known issues
~~~~~~~~~~~~
The following issues are known to be present in the beta and will be fixed
before rc1:
- Issue 1018: Cluster init (and possibly other jobs) occasionally fail to start.
Version 2.12.1
--------------
*(Released Wed, 14 Jan 2015)*
- Fix users under which the wconfd and metad daemons run (issue #976)
- Clean up stale livelock files (issue #865)
- Fix setting up the metadata daemon's network interface for Xen
- Make watcher identify itself on disk activation
- Add "ignore-ipolicy" option to gnt-instance grow-disk
- Check disk size ipolicy during "gnt-instance grow-disk" (issue #995)
Inherited from the 2.11 branch:
- Fix counting votes when doing master failover (issue #962)
- Fix broken haskell dependencies (issues #758 and #912)
- Check if IPv6 is used directly when running SSH (issue #892)
Inherited from the 2.10 branch:
- Fix typo in gnt_cluster output (issue #1015)
- Use the Python path detected at configure time in the top-level Python
scripts.
- Fix check for sphinx-build from python2-sphinx
- Properly check if an instance exists in 'gnt-instance console'
Version 2.12.0
--------------
......
......@@ -59,6 +59,13 @@ handling of SSH keys will not affect your cluster.
If you want to be prompted for each newly created SSH key, leave out
the ``--no-ssh-key-check`` option in the command listed above.
Note that after a downgrade from 2.13 to 2.12, the individual SSH keys
will not get removed automatically. This can lead to reachability
errors under very specific circumstances (Issue 1008). In case you plan
on keeping 2.12 for a while and not upgrade to 2.13 again soon, we recommend
to replace all SSH key pairs of non-master nodes' with the master node's SSH
key pair.
2.11
----
......
==================
Ganeti 2.13 design
==================
The following design documents have been implemented in Ganeti 2.13.
- :doc:`design-disk-conversion`
- :doc:`design-optables`
The following designs have been partially implemented in Ganeti 2.13.
- :doc:`design-location`
- :doc:`design-node-security`
- :doc:`design-os`
......@@ -2,7 +2,7 @@
Design document drafts
======================
.. Last updated for Ganeti 2.12
.. Last updated for Ganeti 2.13
.. toctree::
:maxdepth: 2
......@@ -14,13 +14,11 @@ Design document drafts
design-storagetypes.rst
design-glusterfs-ganeti-support.rst
design-hugepages-support.rst
design-optables.rst
design-ceph-ganeti-support.rst
design-hsqueeze.rst
design-os.rst
design-move-instance-improvements.rst
design-node-security.rst
design-disk-conversion.rst
design-ifdown.rst
design-location.rst
design-reservations.rst
......
......@@ -78,6 +78,7 @@ and draft versions (which are either incomplete or not implemented).
design-2.10.rst
design-2.11.rst
design-2.12.rst
design-2.13.rst
Draft designs
-------------
......@@ -100,6 +101,7 @@ Draft designs
design-cpu-pinning.rst
design-device-uuid-name.rst
design-daemons.rst
design-disk-conversion.rst
design-disks.rst
design-file-based-storage.rst
design-hroller.rst
......@@ -118,6 +120,7 @@ Draft designs
design-oob.rst
design-openvswitch.rst
design-opportunistic-locking.rst
design-optables.rst
design-os.rst
design-ovf-support.rst
design-partitioned
......
......@@ -1013,22 +1013,17 @@ def _VerifySshSetup(node_status_list, my_name,
(_, key_files) = \
ssh.GetAllUserFiles(constants.SSH_LOGIN_USER, mkdir=False, dircheck=False)
(_, dsa_pub_key_filename) = key_files[constants.SSHK_DSA]
my_keys = pub_keys[my_uuid]
num_keys = 0
for (key_type, (_, pub_key_file)) in key_files.items():
try:
pub_key = utils.ReadFile(pub_key_file)
if pub_key.strip() not in my_keys:
result.append("The %s key of node %s does not match this node's keys"
" in the pub key file." % (key_type, my_name))
num_keys += 1
except IOError:
# There might not be keys of every type.
pass
if num_keys != len(my_keys):
result.append("The number of keys for node %s in the public key file"
" (%s) does not match the number of keys on the node"
" (%s)." % (my_name, len(my_keys), len(key_files)))
dsa_pub_key = utils.ReadFile(dsa_pub_key_filename)
if dsa_pub_key.strip() not in my_keys:
result.append("The dsa key of node %s does not match this node's key"
" in the pub key file." % (my_name))
if len(my_keys) != 1:
result.append("There is more than one key for node %s in the public key"
" file." % my_name)
else:
if len(pub_keys.keys()) > 0:
result.append("The public key file of node '%s' is not empty, although"
......@@ -1918,19 +1913,20 @@ def RenewSshKeys(node_uuids, node_names, ssh_port_map,
noded_cert_file=noded_cert_file,
run_cmd_fn=run_cmd_fn)
fetched_keys = ssh.ReadRemoteSshPubKeys(root_keyfiles, node_name,
cluster_name,
ssh_port_map[node_name],
False, # ask_key
False) # key_check
if not fetched_keys:
try:
(_, dsa_pub_keyfile) = root_keyfiles[constants.SSHK_DSA]
pub_key = ssh.ReadRemoteSshPubKeys(dsa_pub_keyfile,
node_name, cluster_name,
ssh_port_map[node_name],
False, # ask_key
False) # key_check
except:
raise errors.SshUpdateError("Could not fetch key of node %s"
" (UUID %s)" % (node_name, node_uuid))
if potential_master_candidate:
ssh.RemovePublicKey(node_uuid, key_file=pub_key_file)
for pub_key in fetched_keys.values():
ssh.AddPublicKey(node_uuid, pub_key, key_file=pub_key_file)
ssh.AddPublicKey(node_uuid, pub_key, key_file=pub_key_file)
AddNodeSshKey(node_uuid, node_name,
potential_master_candidates,
......
......@@ -1124,28 +1124,22 @@ def _BuildGanetiPubKeys(options, pub_key_file=pathutils.SSH_PUB_KEYS, cl=None,
nonmaster_nodes = [name for name in online_nodes
if name != master_node]
(_, root_keyfiles) = \
ssh.GetAllUserFiles(constants.SSH_LOGIN_USER, mkdir=False, dircheck=False,
_homedir_fn=homedir_fn)
_, pub_key_filename, _ = \
ssh.GetUserFiles(constants.SSH_LOGIN_USER, mkdir=False, dircheck=False,
kind=constants.SSHK_DSA, _homedir_fn=homedir_fn)
# get the key file of the master node
for (_, (_, public_key_file)) in root_keyfiles.items():
try:
pub_key = utils.ReadFile(public_key_file)
ssh.AddPublicKey(node_uuid_map[master_node], pub_key,
key_file=pub_key_file)
except IOError:
# Not all types of keys might be existing
pass
pub_key = utils.ReadFile(pub_key_filename)
ssh.AddPublicKey(node_uuid_map[master_node], pub_key,
key_file=pub_key_file)
# get the key files of all non-master nodes
for node in nonmaster_nodes:
fetched_keys = ssh.ReadRemoteSshPubKeys(root_keyfiles, node, cluster_name,
ssh_port_map[node],
options.ssh_key_check,
options.ssh_key_check)
for pub_key in fetched_keys.values():
ssh.AddPublicKey(node_uuid_map[node], pub_key, key_file=pub_key_file)
pub_key = ssh.ReadRemoteSshPubKeys(pub_key_filename, node, cluster_name,
ssh_port_map[node],
options.ssh_key_check,
options.ssh_key_check)
ssh.AddPublicKey(node_uuid_map[node], pub_key, key_file=pub_key_file)
def RenewCrypto(opts, args):
......
......@@ -167,7 +167,7 @@ def _TryReadFile(path):
def _ReadSshKeys(keyfiles, _tostderr_fn=ToStderr):
"""Reads SSH keys according to C{keyfiles}.
"""Reads the DSA SSH keys according to C{keyfiles}.
@type keyfiles: dict
@param keyfiles: Dictionary with keys of L{constants.SSHK_ALL} and two-values
......@@ -186,8 +186,8 @@ def _ReadSshKeys(keyfiles, _tostderr_fn=ToStderr):
if public_key and private_key:
result.append((kind, private_key, public_key))
elif public_key or private_key:
_tostderr_fn("Couldn't find a complete set of keys for kind '%s'; files"
" '%s' and '%s'", kind, private_file, public_file)
_tostderr_fn("Couldn't find a complete set of keys for kind '%s';"
" files '%s' and '%s'", kind, private_file, public_file)
return result
......@@ -222,7 +222,10 @@ def _SetupSSH(options, cluster_name, node, ssh_port, cl):
(_, root_keyfiles) = \
ssh.GetAllUserFiles(constants.SSH_LOGIN_USER, mkdir=False, dircheck=False)
root_keys = _ReadSshKeys(root_keyfiles)
dsa_root_keyfiles = dict((kind, value) for (kind, value)
in root_keyfiles.items()
if kind == constants.SSHK_DSA)
root_keys = _ReadSshKeys(dsa_root_keyfiles)
(_, cert_pem) = \
utils.ExtractX509Certificate(utils.ReadFile(pathutils.NODED_CERT_FILE))
......@@ -241,14 +244,14 @@ def _SetupSSH(options, cluster_name, node, ssh_port, cl):
use_cluster_key=False, ask_key=options.ssh_key_check,
strict_host_check=options.ssh_key_check)
fetched_keys = ssh.ReadRemoteSshPubKeys(root_keyfiles, node, cluster_name,
ssh_port, options.ssh_key_check,
options.ssh_key_check)
for pub_key in fetched_keys.values():
# Unfortunately, we have to add the key with the node name rather than
# the node's UUID here, because at this point, we do not have a UUID yet.
# The entry will be corrected in noded later.
ssh.AddPublicKey(node, pub_key)
(_, dsa_pub_keyfile) = root_keyfiles[constants.SSHK_DSA]
pub_key = ssh.ReadRemoteSshPubKeys(dsa_pub_keyfile, node, cluster_name,
ssh_port, options.ssh_key_check,
options.ssh_key_check)
# Unfortunately, we have to add the key with the node name rather than
# the node's UUID here, because at this point, we do not have a UUID yet.
# The entry will be corrected in noded later.
ssh.AddPublicKey(node, pub_key)
@UsesRPC
......
#
#
# Copyright (C) 2010, 2013 Google Inc.
# Copyright (C) 2010, 2013, 2014, 2015 Google Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
......@@ -741,6 +741,19 @@ class LXCHypervisor(hv_base.BaseHypervisor):
" lxc process exited after being daemonized" %
instance.name)
@classmethod
def _VerifyDiskRequirements(cls, block_devices):
"""Insures that the disks provided work with the current implementation.
"""
if len(block_devices) == 0:
raise HypervisorError("LXC cannot have diskless instances.")
if len(block_devices) > 1:
raise HypervisorError("At the moment, LXC cannot support more than one"
" disk attached to it. Please create this"
" instance anew with fewer disks.")
def StartInstance(self, instance, block_devices, startup_paused):
"""Start an instance.
......@@ -748,6 +761,8 @@ class LXCHypervisor(hv_base.BaseHypervisor):
We use volatile containers.
"""
LXCHypervisor._VerifyDiskRequirements(block_devices)
stash = {}
# Since LXC version >= 1.0.0, the LXC strictly requires all cgroup
......@@ -766,9 +781,6 @@ class LXCHypervisor(hv_base.BaseHypervisor):
_CreateBlankFile(log_file, constants.SECURE_FILE_MODE)
try:
if not block_devices:
raise HypervisorError("LXC needs at least one disk")
sda_dev_path = block_devices[0][1]
# LXC needs to use partition mapping devices to access each partition
# of the storage
......
......@@ -684,7 +684,7 @@ def InitSSHSetup(error_fn=errors.OpPrereqError, _homedir_fn=None,
"""
priv_key, _, auth_keys = GetUserFiles(constants.SSH_LOGIN_USER,
_homedir_fn=_homedir_fn)
_homedir_fn=_homedir_fn)
new_priv_key_name = priv_key + _suffix
new_pub_key_name = priv_key + _suffix + ".pub"
......@@ -1087,40 +1087,27 @@ def GetSshPortMap(nodes, cfg):
return node_port_map
def ReadRemoteSshPubKeys(keyfiles, node, cluster_name, port, ask_key,
def ReadRemoteSshPubKeys(pub_key_file, node, cluster_name, port, ask_key,
strict_host_check):
"""Fetches the public SSH keys from a node via SSH.
"""Fetches the public DSA SSH key from a node via SSH.
@type keyfiles: dict from string to (string, string) tuples
@param keyfiles: a dictionary mapping the type of key (e.g. rsa, dsa) to a
tuple consisting of the file name of the private and public key
@type pub_key_file: string
@param pub_key_file: a tuple consisting of the file name of the public DSA key
"""
ssh_runner = SshRunner(cluster_name)
failed_results = {}
fetched_keys = {}
for (kind, (_, public_key_file)) in keyfiles.items():
cmd = ["cat", public_key_file]
ssh_cmd = ssh_runner.BuildCmd(node, constants.SSH_LOGIN_USER,
utils.ShellQuoteArgs(cmd),
batch=False, ask_key=ask_key, quiet=False,
strict_host_check=strict_host_check,
use_cluster_key=False,
port=port)
result = utils.RunCmd(ssh_cmd)
if result.failed:
failed_results[kind] = (result.cmd, result.fail_reason)
else:
fetched_keys[kind] = result.stdout
cmd = ["cat", pub_key_file]
ssh_cmd = ssh_runner.BuildCmd(node, constants.SSH_LOGIN_USER,
utils.ShellQuoteArgs(cmd),
batch=False, ask_key=ask_key, quiet=False,
strict_host_check=strict_host_check,
use_cluster_key=False,
port=port)
if len(fetched_keys.keys()) < 1:
error_msg = "Could not fetch any public SSH key."
for (kind, (cmd, fail_reason)) in failed_results.items():
error_msg += "Could not fetch the public '%s' SSH key from node '%s':" \
" ran command '%s', failure reason: '%s'. " % \
(kind, node, cmd, fail_reason)
raise errors.OpPrereqError(error_msg)
return fetched_keys
result = utils.RunCmd(ssh_cmd)
if result.failed:
raise errors.OpPrereqError("Could not fetch a public DSA SSH key from node"
" '%s': ran command '%s', failure reason: '%s'."
% (node, cmd, result.fail_reason))
return result.stdout
......@@ -333,7 +333,7 @@ def _ExtStorageAction(action, unique_id, ext_params,
# Explicitly check if the script is valid
try:
_CheckExtStorageFile(inst_es.path, action)
_CheckExtStorageFile(inst_es.path, action) # pylint: disable=E1103
except errors.BlockDeviceError:
base.ThrowError("Action '%s' is not supported by provider '%s'" %
(action, driver))
......@@ -343,6 +343,7 @@ def _ExtStorageAction(action, unique_id, ext_params,
script = getattr(inst_es, script_name)
# Run the external script
# pylint: disable=E1103
result = utils.RunCmd([script], env=create_env,
cwd=inst_es.path, output=logfile,)
if result.failed:
......@@ -357,7 +358,7 @@ def _ExtStorageAction(action, unique_id, ext_params,
lines = [utils.SafeEncode(val)
for val in utils.TailFile(logfile, lines=20)]
else:
lines = result.output[-20:]
lines = result.output.splitlines()[-20:]
base.ThrowError("External storage's %s script failed (%s), last"
" lines of output:\n%s",
......@@ -440,7 +441,7 @@ def ExtStorageFromDisk(name, base_dir=None):
_CheckExtStorageFile(es_dir, filename)
except errors.BlockDeviceError, err:
if required:
return False, err
return False, str(err)
parameters = []
if constants.ES_PARAMETERS_FILE in es_files:
......
......@@ -359,7 +359,8 @@ The options that can be passed to the program are as follows:
option, the only migrations that hbal will do are migrations of
instances off a drained node. This can be useful if during a reinstall
of the base operating system migration is only possible from the old
OS to the new OS.
OS to the new OS. Note, however, that usually the use of migration
tags is the better choice.
\--select-instances=*instances*
This parameter marks the given instances (as a comma-separated list)
......
{
"cluster_tags": [
"htools:iextags:service"
],
"instances": {
"instance-1": {
"admin_state": "up",
"admin_state_source": "admin",
"disk_space_total": 256,
"disk_template": "drbd",
"disks": [
{
"mode": "rw",
"size": 128,
"spindles": 1
}
],
"hypervisor": "xen-pvm",
"memory": 128,
"nics": [
{
"bridge": "xen-br0",
"ip": null,
"link": "xen-br0",
"mac": "aa:00:00:15:92:6f",
"mode": "bridged"
}
],
"nodes": [
"node1",
"node2"
],
"os": "debian-image",
"spindle_use": 1,
"tags": [
"service:foo"
],
"vcpus": 1
},
"instance-2": {
"admin_state": "up",
"admin_state_source": "admin",
"disk_space_total": 256,
"disk_template": "drbd",
"disks": [
{
"mode": "rw",
"size": 128,
"spindles": 1
}
],
"hypervisor": "xen-pvm",
"memory": 128,
"nics": [
{
"bridge": "xen-br0",
"ip": null,
"link": "xen-br0",
"mac": "aa:00:00:15:92:6f",
"mode": "bridged"
}
],
"nodes": [
"node2",
"node3"
],
"os": "debian-image",
"spindle_use": 1,
"tags": [
"service:foo"
],
"vcpus": 1
},
"instance-3": {
"admin_state": "up",
"admin_state_source": "admin",
"disk_space_total": 256,
"disk_template": "drbd",
"disks": [
{
"mode": "rw",
"size": 128,
"spindles": 1
}
],
"hypervisor": "xen-pvm",
"memory": 128,
"nics": [
{
"bridge": "xen-br0",
"ip": null,
"link": "xen-br0",
"mac": "aa:00:00:15:92:6f",
"mode": "bridged"
}
],
"nodes": [
"node3",
"node1"
],
"os": "debian-image",
"spindle_use": 1,
"tags": [
"service:foo"
],
"vcpus": 1
}
},
"nodegroups": {
"uuid-group-1": {
"alloc_policy": "preferred",
"ipolicy": {
"disk-templates": [
"sharedfile",
"diskless",
"plain",
"blockdev",
"drbd",
"file",
"rbd"
],
"minmax": [
{
"max": {
"cpu-count": 8,
"disk-count": 16,
"disk-size": 1048576,
"memory-size": 32768,
"nic-count": 8,
"spindle-use": 8
},
"min": {
"cpu-count": 1,
"disk-count": 1,
"disk-size": 128,
"memory-size": 128,
"nic-count": 1,
"spindle-use": 1
}
}
],
"spindle-ratio": 32.0,
"std": {
"cpu-count": 1,
"disk-count": 1,
"disk-size": 1024,
"memory-size": 128,
"nic-count": 1,
"spindle-use": 1
},
"vcpu-ratio": 4.0
},
"name": "default",
"networks": [],
"tags": []
}
},
"nodes": {
"node1": {
"drained": false,
"free_disk": 1377024,
"free_memory": 32635,
"free_spindles": 12,
"group": "uuid-group-1",
"i_pri_memory": 0,
"i_pri_up_memory": 0,
"master_candidate": true,
"master_capable": true,
"ndparams": {
"exclusive_storage": false,
"oob_program": null,
"spindle_count": 1
},