Commit 01144827 authored by Guido Trotter's avatar Guido Trotter
Browse files

Merge branch 'devel-2.1'



* devel-2.1: (22 commits)
  NEWS: fix empty lines
  Fix a unittest name and docstring
  Force ssh to allocate a tty
  Fix a unittest docstring
  IsProcessAlive: retry stat() a few times
  Retry{Again,Timeout}: explain reraising
  utils.Retry: pass up timeout arguments
  Add a few Retry unittests
  Bump version for 2.1.2.1 release
  Update NEWS for Ganeti 2.1.2.1
  KVM: only export instance tags if present
  ssh.GetUserFiles: move to EnsureDirs
  Hypervisors: use utils.EnsureDirs
  backend: remove a couple of useless mkdir calls
  daemon.GenericMain: fix docstring
  jstore: use EnsureDirs, and add more constants
  Bump version for 2.1.2 release
  Update NEWS file for 2.1.2
  Add dates to the NEWS file
  RAPI QA: Test instance creation/removal via RAPI
  ...

Conflicts:
	NEWS
	  - trivial (merge NEWS entries)
	lib/backend.py
	  - trivial (master doesn't have the changed code)
	lib/constants.py
	  - trivial (merge constants, plus one s/0700/SECURE_DIR_MODE/)
	test/ganeti.utils_unittest.py
	  - trivial (name+docstring changes)
Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
Reviewed-by: default avatarIustin Pop <iustin@google.com>
parents 63bcea2a 507fd05a
......@@ -9,9 +9,116 @@ Version 2.2.0
RFC2616 (HTTP/1.1), section 7.2.1)
Version 2.1.2.1
---------------
*(Released Fri, 7 May 2010)*
Fix a bug which prevented untagged KVM instances from starting.
Version 2.1.2
-------------
*(Released Fri, 7 May 2010)*
Another release with a long development cycle, during which many
different features were added.
Significant features
~~~~~~~~~~~~~~~~~~~~
The KVM hypervisor now can run the individual instances as non-root, to
reduce the impact of a VM being hijacked due to bugs in the
hypervisor. It is possible to run all instances as a single (non-root)
user, to manually specify a user for each instance, or to dynamically
allocate a user out of a cluster-wide pool to each instance, with the
guarantee that no two instances will run under the same user ID on any
given node.
An experimental RAPI client library, that can be used standalone
(without the other Ganeti libraries), is provided in the source tree as
``lib/rapi/client.py``. Note this client might change its interface in
the future, as we iterate on its capabilities.
A new command, ``gnt-cluster renew-crypto`` has been added to easily
replace the cluster's certificates and crypto keys. This might help in
case they have been compromised, or have simply expired.
A new disk option for instance creation has been added that allows one
to "adopt" currently existing logical volumes, with data
preservation. This should allow easier migration to Ganeti from
unmanaged (or managed via other software) instances.
Another disk improvement is the possibility to convert between redundant
(DRBD) and plain (LVM) disk configuration for an instance. This should
allow better scalability (starting with one node and growing the
cluster, or shrinking a two-node cluster to one node).
A new feature that could help with automated node failovers has been
implemented: if a node sees itself as offline (by querying the master
candidates), it will try to shutdown (hard) all instances and any active
DRBD devices. This reduces the risk of duplicate instances if an
external script automatically failovers the instances on such nodes. To
enable this, the cluster parameter ``maintain_node_health`` should be
enabled; in the future this option (per the name) will enable other
automatic maintenance features.
Instance export/import now will reuse the original instance
specifications for all parameters; that means exporting an instance,
deleting it and the importing it back should give an almost identical
instance. Note that the default import behaviour has changed from
before, where it created only one NIC; now it recreates the original
number of NICs.
Cluster verify has added a few new checks: SSL certificates validity,
/etc/hosts consistency across the cluster, etc.
Other changes
~~~~~~~~~~~~~
As usual, many internal changes were done, documentation fixes,
etc. Among others:
- Fixed cluster initialization with disabled cluster storage (regression
introduced in 2.1.1)
- File-based storage supports growing the disks
- Fixed behaviour of node role changes
- Fixed cluster verify for some corner cases, plus a general rewrite of
cluster verify to allow future extension with more checks
- Fixed log spamming by watcher and node daemon (regression introduced
in 2.1.1)
- Fixed possible validation issues when changing the list of enabled
hypervisors
- Fixed cleanup of /etc/hosts during node removal
- Fixed RAPI response for invalid methods
- Fixed bug with hashed passwords in ``ganeti-rapi`` daemon
- Multiple small improvements to the KVM hypervisor (VNC usage, booting
from ide disks, etc.)
- Allow OS changes without re-installation (to record a changed OS
outside of Ganeti, or to allow OS renames)
- Allow instance creation without OS installation (useful for example if
the OS will be installed manually, or restored from a backup not in
Ganeti format)
- Implemented option to make cluster ``copyfile`` use the replication
network
- Added list of enabled hypervisors to ssconf (possibly useful for
external scripts)
- Added a new tool (``tools/cfgupgrade12``) that allows upgrading from
1.2 clusters
- A partial form of node re-IP is possible via node readd, which now
allows changed node primary IP
- Command line utilities now show an informational message if the job is
waiting for a lock
- The logs of the master daemon now show the PID/UID/GID of the
connected client
Version 2.1.1
-------------
*(Released Fri, 12 Mar 2010)*
During the 2.1.0 long release candidate cycle, a lot of improvements and
changes have accumulated with were released later as 2.1.1.
......@@ -126,6 +233,8 @@ New features
Version 2.1.0
-------------
*(Released Tue, 2 Mar 2010)*
Ganeti 2.1 brings many improvements with it. Major changes:
- Added infrastructure to ease automated disk repairs
......@@ -225,6 +334,8 @@ Details
Version 2.0.6
-------------
*(Released Thu, 4 Feb 2010)*
- Fix cleaner behaviour on nodes not in a cluster (Debian bug 568105)
- Fix a string formatting bug
- Improve safety of the code in some error paths
......@@ -234,6 +345,8 @@ Version 2.0.6
Version 2.0.5
-------------
*(Released Thu, 17 Dec 2009)*
- Fix security issue due to missing validation of iallocator names; this
allows local and remote execution of arbitrary executables
- Fix failure of gnt-node list during instance removal
......@@ -243,6 +356,8 @@ Version 2.0.5
Version 2.0.4
-------------
*(Released Wed, 30 Sep 2009)*
- Fixed many wrong messages
- Fixed a few bugs related to the locking library
- Fixed MAC checking at instance creation time
......@@ -286,6 +401,8 @@ Version 2.0.4
Version 2.0.3
-------------
*(Released Fri, 7 Aug 2009)*
- Added ``--ignore-size`` to the ``gnt-instance activate-disks`` command
to allow using the pre-2.0.2 behaviour in activation, if any existing
instances have mismatched disk sizes in the configuration
......@@ -302,6 +419,8 @@ Version 2.0.3
Version 2.0.2
-------------
*(Released Fri, 17 Jul 2009)*
- Added experimental support for stripped logical volumes; this should
enhance performance but comes with a higher complexity in the block
device handling; stripping is only enabled when passing
......@@ -334,6 +453,8 @@ Version 2.0.2
Version 2.0.1
-------------
*(Released Tue, 16 Jun 2009)*
- added ``-H``/``-B`` startup parameters to ``gnt-instance``, which will
allow re-adding the start in single-user option (regression from 1.2)
- the watcher writes the instance status to a file, to allow monitoring
......@@ -364,12 +485,16 @@ Version 2.0.1
Version 2.0.0 final
-------------------
*(Released Wed, 27 May 2009)*
- no changes from rc5
Version 2.0 release candidate 5
-------------------------------
*(Released Wed, 20 May 2009)*
- fix a couple of bugs (validation, argument checks)
- fix ``gnt-cluster getmaster`` on non-master nodes (regression)
- some small improvements to RAPI and IAllocator
......@@ -379,6 +504,8 @@ Version 2.0 release candidate 5
Version 2.0 release candidate 4
-------------------------------
*(Released Mon, 27 Apr 2009)*
- change the OS list to not require locks; this helps with big clusters
- fix ``gnt-cluster verify`` and ``gnt-cluster verify-disks`` when the
volume group is broken
......@@ -392,6 +519,8 @@ Version 2.0 release candidate 4
Version 2.0 release candidate 3
-------------------------------
*(Released Wed, 8 Apr 2009)*
- Change the internal locking model of some ``gnt-node`` commands, in
order to reduce contention (and blocking of master daemon) when
batching many creation/reinstall jobs
......@@ -404,6 +533,8 @@ Version 2.0 release candidate 3
Version 2.0 release candidate 2
-------------------------------
*(Released Fri, 27 Mar 2009)*
- Now the cfgupgrade scripts works and can upgrade 1.2.7 clusters to 2.0
- Fix watcher startup sequence, improves the behaviour of busy clusters
- Some other fixes in ``gnt-cluster verify``, ``gnt-instance
......@@ -415,6 +546,8 @@ Version 2.0 release candidate 2
Version 2.0 release candidate 1
-------------------------------
*(Released Mon, 2 Mar 2009)*
- More documentation updates, now all docs should be more-or-less
up-to-date
- A couple of small fixes (mixed hypervisor clusters, offline nodes,
......@@ -427,6 +560,8 @@ Version 2.0 release candidate 1
Version 2.0 beta 2
------------------
*(Released Thu, 19 Feb 2009)*
- Xen PVM and KVM have switched the default value for the instance root
disk to the first partition on the first drive, instead of the whole
drive; this means that the OS installation scripts must be changed
......@@ -443,6 +578,8 @@ Version 2.0 beta 2
Version 2.0 beta 1
------------------
*(Released Mon, 26 Jan 2009)*
- Version 2 is a general rewrite of the code and therefore the
differences are too many to list, see the design document for 2.0 in
the ``doc/`` subdirectory for more details
......@@ -473,6 +610,8 @@ Version 2.0 beta 1
Version 1.2.7
-------------
*(Released Tue, 13 Jan 2009)*
- Change the default reboot type in ``gnt-instance reboot`` to "hard"
- Reuse the old instance mac address by default on instance import, if
the instance name is the same.
......@@ -498,6 +637,8 @@ Version 1.2.7
Version 1.2.6
-------------
*(Released Wed, 24 Sep 2008)*
- new ``--hvm-nic-type`` and ``--hvm-disk-type`` flags to control the
type of disk exported to fully virtualized instances.
- provide access to the serial console of HVM instances
......@@ -521,6 +662,8 @@ Version 1.2.6
Version 1.2.5
-------------
*(Released Tue, 22 Jul 2008)*
- note: the allowed size and number of tags per object were reduced
- fix a bug in ``gnt-cluster verify`` with inconsistent volume groups
- fixed twisted 8.x compatibility
......@@ -538,6 +681,8 @@ Version 1.2.5
Version 1.2.4
-------------
*(Released Fri, 13 Jun 2008)*
- Experimental readonly, REST-based remote API implementation;
automatically started on master node, TCP port 5080, if enabled by
``--enable-rapi`` parameter to configure script.
......@@ -581,6 +726,8 @@ Version 1.2.4
Version 1.2.3
-------------
*(Released Mon, 18 Feb 2008)*
- more tweaks to the disk activation code (especially helpful for DRBD)
- change the default ``gnt-instance list`` output format, now there is
one combined status field (see the manpage for the exact values this
......@@ -595,6 +742,8 @@ Version 1.2.3
Version 1.2.2
-------------
*(Released Wed, 30 Jan 2008)*
- fix ``gnt-instance modify`` breakage introduced in 1.2.1 with the HVM
support (issue 23)
- add command aliases infrastructure and a few aliases
......@@ -612,6 +761,8 @@ Version 1.2.2
Version 1.2.1
-------------
*(Released Wed, 16 Jan 2008)*
- experimental HVM support, read the install document, section
"Initializing the cluster"
- allow for the PVM hypervisor per-instance kernel and initrd paths
......@@ -630,6 +781,8 @@ Version 1.2.1
Version 1.2.0
-------------
*(Released Tue, 4 Dec 2007)*
- Log the ``xm create`` output to the node daemon log on failure (to
help diagnosing the error)
- In debug mode, log all external commands output if failed to the logs
......@@ -639,6 +792,8 @@ Version 1.2.0
Version 1.2b3
-------------
*(Released Wed, 28 Nov 2007)*
- Another round of updates to the DRBD 8 code to deal with more failures
in the replace secondary node operation
- Some more logging of failures in disk operations (lvm, drbd)
......@@ -649,6 +804,8 @@ Version 1.2b3
Version 1.2b2
-------------
*(Released Tue, 13 Nov 2007)*
- Change configuration file format from Python's Pickle to JSON.
Upgrading is possible using the cfgupgrade utility.
- Add support for DRBD 8.0 (new disk template ``drbd``) which allows for
......
# Configure script for Ganeti
m4_define([gnt_version_major], [2])
m4_define([gnt_version_minor], [1])
m4_define([gnt_version_revision], [1])
m4_define([gnt_version_suffix], [])
m4_define([gnt_version_revision], [2])
m4_define([gnt_version_suffix], [.1])
m4_define([gnt_version_full],
m4_format([%d.%d.%d%s],
gnt_version_major, gnt_version_minor,
......
......@@ -90,9 +90,10 @@ BDEV_CACHE_DIR = RUN_GANETI_DIR + "/bdev-cache"
DISK_LINKS_DIR = RUN_GANETI_DIR + "/instance-disks"
RUN_DIRS_MODE = 0755
SOCKET_DIR = RUN_GANETI_DIR + "/socket"
SOCKET_DIR_MODE = 0700
SECURE_DIR_MODE = 0700
SOCKET_DIR_MODE = SECURE_DIR_MODE
CRYPTO_KEYS_DIR = RUN_GANETI_DIR + "/crypto"
CRYPTO_KEYS_DIR_MODE = 0700
CRYPTO_KEYS_DIR_MODE = SECURE_DIR_MODE
IMPORT_EXPORT_DIR = RUN_GANETI_DIR + "/import-export"
IMPORT_EXPORT_DIR_MODE = 0755
# keep RUN_GANETI_DIR first here, to make sure all get created when the node
......@@ -643,6 +644,8 @@ JOB_QUEUE_ARCHIVE_DIR = QUEUE_DIR + "/archive"
JOB_QUEUE_DRAIN_FILE = QUEUE_DIR + "/drain"
JOB_QUEUE_SIZE_HARD_LIMIT = 5000
JOB_QUEUE_SIZE_SOFT_LIMIT = JOB_QUEUE_SIZE_HARD_LIMIT * 0.8
JOB_QUEUE_DIRS = [QUEUE_DIR, JOB_QUEUE_ARCHIVE_DIR]
JOB_QUEUE_DIRS_MODE = SECURE_DIR_MODE
JOB_ID_TEMPLATE = r"\d+"
......
......@@ -251,8 +251,9 @@ def GenericMain(daemon_name, optionparser, dirs, check_fn, exec_fn,
@type optionparser: optparse.OptionParser
@param optionparser: initialized optionparser with daemon-specific options
(common -f -d options will be handled by this module)
@type dirs: list of strings
@param dirs: list of directories that must exist for this daemon to work
@type dirs: list of (string, integer)
@param dirs: list of directories that must be created if they don't exist,
and the permissions to be used to create them
@type check_fn: function which accepts (options, args)
@param check_fn: function that checks start conditions and exits if they're
not met
......
......@@ -67,11 +67,7 @@ class ChrootManager(hv_base.BaseHypervisor):
def __init__(self):
hv_base.BaseHypervisor.__init__(self)
if not os.path.exists(self._ROOT_DIR):
os.mkdir(self._ROOT_DIR)
if not os.path.isdir(self._ROOT_DIR):
raise HypervisorError("Needed path %s is not a directory" %
self._ROOT_DIR)
utils.EnsureDirs([(self._ROOT_DIR, constants.RUN_DIRS_MODE)])
@staticmethod
def _IsDirLive(path):
......
......@@ -46,8 +46,7 @@ class FakeHypervisor(hv_base.BaseHypervisor):
def __init__(self):
hv_base.BaseHypervisor.__init__(self)
if not os.path.exists(self._ROOT_DIR):
os.mkdir(self._ROOT_DIR)
utils.EnsureDirs([(self._ROOT_DIR, constants.RUN_DIRS_MODE)])
def ListInstances(self):
"""Get the list of running instances.
......
......@@ -287,7 +287,8 @@ class KVMHypervisor(hv_base.BaseHypervisor):
if nic.nicparams[constants.NIC_MODE] == constants.NIC_MODE_BRIDGED:
script.write("export BRIDGE=%s\n" % nic.nicparams[constants.NIC_LINK])
script.write("export INTERFACE=$1\n")
script.write("export TAGS=\"%s\"\n" % " ".join(instance.tags))
if instance.tags:
script.write("export TAGS=\"%s\"\n" % " ".join(instance.tags))
# TODO: make this configurable at ./configure time
script.write("if [ -x '%s' ]; then\n" % self._KVM_NETWORK_SCRIPT)
script.write(" # Execute the user-specific vif file\n")
......
......@@ -21,7 +21,6 @@
"""Module implementing the job queue handling."""
import os
import errno
from ganeti import constants
......@@ -79,13 +78,8 @@ def InitAndVerifyQueue(must_lock):
locking mode.
"""
# Make sure our directories exists
for path in (constants.QUEUE_DIR, constants.JOB_QUEUE_ARCHIVE_DIR):
try:
os.mkdir(path, 0700)
except OSError, err:
if err.errno not in (errno.EEXIST, ):
raise
dirs = [(d, constants.JOB_QUEUE_DIRS_MODE) for d in constants.JOB_QUEUE_DIRS]
utils.EnsureDirs(dirs)
# Lock queue
queue_lock = utils.FileLock.Open(constants.JOB_QUEUE_LOCK_FILE)
......
......@@ -98,7 +98,7 @@ class GanetiRapiClient(object):
USER_AGENT = "Ganeti RAPI Client"
def __init__(self, master_hostname, port=5080, username=None, password=None,
ssl_cert=None):
ssl_cert_file=None):
"""Constructor.
@type master_hostname: str
......@@ -109,23 +109,32 @@ class GanetiRapiClient(object):
@param username: the username to connect with
@type password: str
@param password: the password to connect with
@type ssl_cert: str or None
@param ssl_cert: the expected SSL certificate. if None, SSL certificate
will not be verified
@type ssl_cert_file: str or None
@param ssl_cert_file: path to the expected SSL certificate. if None, SSL
certificate will not be verified
"""
self._master_hostname = master_hostname
self._port = port
if ssl_cert:
_VerifyCertificate(self._master_hostname, self._port, ssl_cert)
self._version = None
self._http = httplib2.Http()
# Older versions of httplib2 don't support the connection_type argument
# to request(), so we have to manually specify the connection object in the
# internal dict.
base_url = self._MakeUrl("/", prepend_version=False)
scheme, authority, _, _, _ = httplib2.parse_uri(base_url)
conn_key = "%s:%s" % (scheme, authority)
self._http.connections[conn_key] = \
HTTPSConnectionOpenSSL(master_hostname, port, cert_file=ssl_cert_file)
self._headers = {
"Accept": "text/plain",
"Content-type": "application/x-www-form-urlencoded",
"User-Agent": self.USER_AGENT}
self._version = None
if username and password:
if username is not None and password is not None:
self._http.add_credentials(username, password)
def _MakeUrl(self, path, query=None, prepend_version=True):
......@@ -144,9 +153,7 @@ class GanetiRapiClient(object):
"""
if prepend_version:
if not self._version:
self._GetVersionInternal()
path = "/%d%s" % (self._version, path)
path = "/%d%s" % (self.GetVersion(), path)
return "https://%(host)s:%(port)d%(path)s?%(query)s" % {
"host": self._master_hostname,
......@@ -176,15 +183,19 @@ class GanetiRapiClient(object):
@rtype: str
@return: JSON-Decoded response
@raises CertificateError: If an invalid SSL certificate is found
@raises GanetiApiError: If an invalid response is returned
"""
if content:
simplejson.JSONEncoder(sort_keys=True).encode(content)
content = simplejson.JSONEncoder(sort_keys=True).encode(content)
url = self._MakeUrl(path, query, prepend_version)
resp_headers, resp_content = self._http.request(
url, method, body=content, headers=self._headers)
try:
resp_headers, resp_content = self._http.request(url, method,
body=content, headers=self._headers)
except (crypto.Error, SSL.Error):
raise CertificateError("Invalid SSL certificate.")
if resp_content:
resp_content = simplejson.loads(resp_content)
......@@ -201,26 +212,16 @@ class GanetiRapiClient(object):
return resp_content
def _GetVersionInternal(self):
"""Gets the Remote API version running on the cluster.
@rtype: int
@return: Ganeti version
"""
self._version = self._SendRequest(HTTP_GET, "/version",
prepend_version=False)
return self._version
def GetVersion(self):
"""Gets the Remote API version running on the cluster.
@rtype: int
@return: Ganeti version
@return: Ganeti Remote API version
"""
if not self._version:
self._GetVersionInternal()
if self._version is None:
self._version = self._SendRequest(HTTP_GET, "/version",
prepend_version=False)
return self._version
def GetOperatingSystems(self):
......@@ -841,26 +842,3 @@ class HTTPSConnectionOpenSSL(httplib.HTTPSConnection):
ssl = SSL.Connection(ctx, sock)
ssl.connect((self.host, self.port))
self.sock = httplib.FakeSocket(sock, ssl)
def _VerifyCertificate(hostname, port, cert_file):
"""Verifies the SSL certificate for the given host/port.
@type hostname: str
@param hostname: the ganeti cluster master whose certificate to verify
@type port: int
@param port: the port on which the RAPI is running
@type cert_file: str
@param cert_file: filename of the expected SSL certificate
@raises CertificateError: If an invalid SSL certificate is found
"""
https = HTTPSConnectionOpenSSL(hostname, port, cert_file=cert_file)
try:
try:
https.request(HTTP_GET, "/version")
except (crypto.Error, SSL.Error):
raise CertificateError("Invalid SSL certificate.")
finally:
https.close()
......@@ -53,13 +53,8 @@ def GetUserFiles(user, mkdir=False):
raise errors.OpExecError("Cannot resolve home of user %s" % user)
ssh_dir = utils.PathJoin(user_dir, ".ssh")
if not os.path.lexists(ssh_dir):
if mkdir:
try:
os.mkdir(ssh_dir, 0700)
except EnvironmentError, err:
raise errors.OpExecError("Can't create .ssh dir for user %s: %s" %
(user, str(err)))
if mkdir:
utils.EnsureDirs([(ssh_dir, constants.SECURE_DIR_MODE)])
elif not os.path.isdir(ssh_dir):
raise errors.OpExecError("path ~%s/.ssh is not a directory" % user)
......@@ -161,7 +156,7 @@ class SshRunner:
strict_host_check, private_key,
quiet=quiet))
if tty:
argv.append("-t")
argv.extend(["-t", "-t"])
argv.extend(["%s@%s" % (user, hostname), command])
return argv
......
......@@ -833,16 +833,28 @@ def IsProcessAlive(pid):
@return: True if the process exists
"""
def _TryStat(name):
try:
os.stat(name)
return True
except EnvironmentError, err:
if err.errno in (errno.ENOENT, errno.ENOTDIR):
return False
elif err.errno == errno.EINVAL:
raise RetryAgain(err)
raise
assert isinstance(pid, int), "pid must be an integer"
if pid <= 0:
return False
proc_entry = "/proc/%d/status" % pid
# /proc in a multiprocessor environment can have strange behaviors.
# Retry the os.stat a few times until we get a good result.
try:
os.stat("/proc/%d/status" % pid)
return True
except EnvironmentError, err:
if err.errno in (errno.ENOENT, errno.ENOTDIR):
return False
raise
return Retry(_TryStat, (0.01, 1.5, 0.1), 0.5, args=[proc_entry])
except RetryTimeout, err:
err.RaiseInner()
def ReadPidFile(pidfile):
......@@ -2975,12 +2987,25 @@ def ReadWatcherPauseFile(filename, now=None, remove_after=3600):
class RetryTimeout(Exception):
"""Retry loop timed out.
Any arguments which was passed by the retried function to RetryAgain will be
preserved in RetryTimeout, if it is raised. If such argument was an exception
the RaiseInner helper method will reraise it.
"""
def RaiseInner(self):
if self.args and isinstance(self.args[0], Exception):
raise self.args[0]
else:
raise RetryTimeout(*self.args)
class RetryAgain(Exception):
"""Retry again.
Any arguments passed to RetryAgain will be preserved, if a timeout occurs, as