diff --git a/NEWS b/NEWS index eed6c0ba4af828333ef3e4144bebd8ad4c24f17a..e530cf5b1f4a205feb85b172af76c52946e44fe1 100644 --- a/NEWS +++ b/NEWS @@ -1,6 +1,14 @@ News ==== +Version 2.2.0 +------------- + +*(Released Mon, 4 Oct 2010)* + +- Fixed regression in ``gnt-instance rename`` + + Version 2.2.0 rc2 ----------------- diff --git a/configure.ac b/configure.ac index eeece85488dbd5634a1af4428065c2c3761a03bd..5e7a99a1f60eabd327ab202f1c1ffc24abb13213 100644 --- a/configure.ac +++ b/configure.ac @@ -2,7 +2,7 @@ m4_define([gnt_version_major], [2]) m4_define([gnt_version_minor], [2]) m4_define([gnt_version_revision], [0]) -m4_define([gnt_version_suffix], [~rc2]) +m4_define([gnt_version_suffix], []) m4_define([gnt_version_full], m4_format([%d.%d.%d%s], gnt_version_major, gnt_version_minor, diff --git a/doc/design-2.2.rst b/doc/design-2.2.rst index 57a67803bda6e22435a332a1d14471d0400a376e..162848ab084c516deda1e5667e2f8736054ceb88 100644 --- a/doc/design-2.2.rst +++ b/doc/design-2.2.rst @@ -11,24 +11,22 @@ adding new features and improvements over 2.1, in a timely fashion. .. contents:: :depth: 4 -Detailed design -=============== - As for 2.1 we divide the 2.2 design into three areas: - core changes, which affect the master daemon/job queue/locking or all/most logical units - logical unit/feature changes -- external interface changes (eg. command line, os api, hooks, ...) +- external interface changes (e.g. command line, OS API, hooks, ...) + Core changes ------------- +============ Master Daemon Scaling improvements -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------------- Current state and shortcomings -++++++++++++++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Currently the Ganeti master daemon is based on four sets of threads: @@ -50,7 +48,7 @@ Also, with the current architecture, masterd suffers from quite a few scalability issues: Core daemon connection handling -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ++++++++++++++++++++++++++++++++ Since the 16 client worker threads handle one connection each, it's very easy to exhaust them, by just connecting to masterd 16 times and not @@ -60,7 +58,7 @@ with better handling long running operations making sure the client is informed that everything is proceeding, and doesn't need to time out. Wait for job change -^^^^^^^^^^^^^^^^^^^ ++++++++++++++++++++ The REQ_WAIT_FOR_JOB_CHANGE luxi operation makes the relevant client thread block on its job for a relative long time. This is another easy @@ -69,7 +67,7 @@ time out, moreover this operation is negative for the job queue lock contention (see below). Job Queue lock -^^^^^^^^^^^^^^ +++++++++++++++ The job queue lock is quite heavily contended, and certain easily reproducible workloads show that's it's very easy to put masterd in @@ -120,7 +118,7 @@ To increase the pain: remote rpcs to complete (starting, finishing, and submitting jobs) Proposed changes -++++++++++++++++ +~~~~~~~~~~~~~~~~ In order to be able to interact with the master daemon even when it's under heavy load, and to make it simpler to add core functionality @@ -135,7 +133,7 @@ smaller in number of threads, and memory size, and thus also easier to understand, debug, and scale. Connection handling -^^^^^^^^^^^^^^^^^^^ ++++++++++++++++++++ We'll move the main thread of ganeti-masterd to asyncore, so that it can share the mainloop code with all other Ganeti daemons. Then all luxi @@ -148,7 +146,7 @@ serializing the reply, which can then be sent asynchronously by the main thread on the socket. Wait for job change -^^^^^^^^^^^^^^^^^^^ ++++++++++++++++++++ The REQ_WAIT_FOR_JOB_CHANGE luxi request is changed to be subscription-based, so that the executing thread doesn't have to be @@ -173,7 +171,7 @@ Other features to look at, when implementing this code are: them at a maximum rate (lower priority). Job Queue lock -^^^^^^^^^^^^^^ +++++++++++++++ In order to decrease the job queue lock contention, we will change the code paths in the following ways, initially: @@ -201,154 +199,11 @@ again after we used the more granular job queue in production and tested its benefits. -Remote procedure call timeouts -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Current state and shortcomings -++++++++++++++++++++++++++++++ - -The current RPC protocol used by Ganeti is based on HTTP. Every request -consists of an HTTP PUT request (e.g. ``PUT /hooks_runner HTTP/1.0``) -and doesn't return until the function called has returned. Parameters -and return values are encoded using JSON. - -On the server side, ``ganeti-noded`` handles every incoming connection -in a separate process by forking just after accepting the connection. -This process exits after sending the response. - -There is one major problem with this design: Timeouts can not be used on -a per-request basis. Neither client or server know how long it will -take. Even if we might be able to group requests into different -categories (e.g. fast and slow), this is not reliable. - -If a node has an issue or the network connection fails while a request -is being handled, the master daemon can wait for a long time for the -connection to time out (e.g. due to the operating system's underlying -TCP keep-alive packets or timeouts). While the settings for keep-alive -packets can be changed using Linux-specific socket options, we prefer to -use application-level timeouts because these cover both machine down and -unresponsive node daemon cases. - -Proposed changes -++++++++++++++++ - -RPC glossary -^^^^^^^^^^^^ - -Function call ID - Unique identifier returned by ``ganeti-noded`` after invoking a - function. -Function process - Process started by ``ganeti-noded`` to call actual (backend) function. - -Protocol -^^^^^^^^ - -Initially we chose HTTP as our RPC protocol because there were existing -libraries, which, unfortunately, turned out to miss important features -(such as SSL certificate authentication) and we had to write our own. - -This proposal can easily be implemented using HTTP, though it would -likely be more efficient and less complicated to use the LUXI protocol -already used to communicate between client tools and the Ganeti master -daemon. Switching to another protocol can occur at a later point. This -proposal should be implemented using HTTP as its underlying protocol. - -The LUXI protocol currently contains two functions, ``WaitForJobChange`` -and ``AutoArchiveJobs``, which can take a longer time. They both support -a parameter to specify the timeout. This timeout is usually chosen as -roughly half of the socket timeout, guaranteeing a response before the -socket times out. After the specified amount of time, -``AutoArchiveJobs`` returns and reports the number of archived jobs. -``WaitForJobChange`` returns and reports a timeout. In both cases, the -functions can be called again. - -A similar model can be used for the inter-node RPC protocol. In some -sense, the node daemon will implement a light variant of *"node daemon -jobs"*. When the function call is sent, it specifies an initial timeout. -If the function didn't finish within this timeout, a response is sent -with a unique identifier, the function call ID. The client can then -choose to wait for the function to finish again with a timeout. -Inter-node RPC calls would no longer be blocking indefinitely and there -would be an implicit ping-mechanism. - -Request handling -^^^^^^^^^^^^^^^^ - -To support the protocol changes described above, the way the node daemon -handles request will have to change. Instead of forking and handling -every connection in a separate process, there should be one child -process per function call and the master process will handle the -communication with clients and the function processes using asynchronous -I/O. - -Function processes communicate with the parent process via stdio and -possibly their exit status. Every function process has a unique -identifier, though it shouldn't be the process ID only (PIDs can be -recycled and are prone to race conditions for this use case). The -proposed format is ``${ppid}:${cpid}:${time}:${random}``, where ``ppid`` -is the ``ganeti-noded`` PID, ``cpid`` the child's PID, ``time`` the -current Unix timestamp with decimal places and ``random`` at least 16 -random bits. - -The following operations will be supported: - -``StartFunction(fn_name, fn_args, timeout)`` - Starts a function specified by ``fn_name`` with arguments in - ``fn_args`` and waits up to ``timeout`` seconds for the function - to finish. Fire-and-forget calls can be made by specifying a timeout - of 0 seconds (e.g. for powercycling the node). Returns three values: - function call ID (if not finished), whether function finished (or - timeout) and the function's return value. -``WaitForFunction(fnc_id, timeout)`` - Waits up to ``timeout`` seconds for function call to finish. Return - value same as ``StartFunction``. - -In the future, ``StartFunction`` could support an additional parameter -to specify after how long the function process should be aborted. - -Simplified timing diagram:: - - Master daemon Node daemon Function process - | - Call function - (timeout 10s) -----> Parse request and fork for ----> Start function - calling actual function, then | - wait up to 10s for function to | - finish | - | | - ... ... - | | - Examine return <---- | | - value and wait | - again -------------> Wait another 10s for function | - | | - ... ... - | | - Examine return <---- | | - value and wait | - again -------------> Wait another 10s for function | - | | - ... ... - | | - | Function ends, - Get return value and forward <-- process exits - Process return <---- it to caller - value and continue - | - -.. TODO: Convert diagram above to graphviz/dot graphic - -On process termination (e.g. after having been sent a ``SIGTERM`` or -``SIGINT`` signal), ``ganeti-noded`` should send ``SIGTERM`` to all -function processes and wait for all of them to terminate. - - Inter-cluster instance moves -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------- Current state and shortcomings -++++++++++++++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ With the current design of Ganeti, moving whole instances between different clusters involves a lot of manual work. There are several ways @@ -359,10 +214,10 @@ necessary in the new environment. The goal is to improve and automate this process in Ganeti 2.2. Proposed changes -++++++++++++++++ +~~~~~~~~~~~~~~~~ Authorization, Authentication and Security -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +++++++++++++++++++++++++++++++++++++++++++ Until now, each Ganeti cluster was a self-contained entity and wouldn't talk to other Ganeti clusters. Nodes within clusters only had to trust @@ -424,7 +279,7 @@ equivalent to the source cluster and must verify the server's certificate while providing a client certificate to the server. Copying data -^^^^^^^^^^^^ +++++++++++++ To simplify the implementation, we decided to operate at a block-device level only, allowing us to easily support non-DRBD instance moves. @@ -442,7 +297,7 @@ consumption, everything is read from the disk and sent over the network directly, where it'll be written to the new block device directly again. Workflow -^^^^^^^^ +++++++++ #. Third party tells source cluster to shut down instance, asks for the instance specification and for the public part of an encryption key @@ -510,7 +365,7 @@ Workflow #. Source cluster removes the instance if requested Instance move in pseudo code -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +++++++++++++++++++++++++++++ .. highlight:: python @@ -651,7 +506,7 @@ clusters and what happens on both clusters. .. highlight:: text Miscellaneous notes -^^^^^^^^^^^^^^^^^^^ ++++++++++++++++++++ - A very similar system could also be used for instance exports within the same cluster. Currently OpenSSH is being used, but could be @@ -679,10 +534,10 @@ Miscellaneous notes Privilege separation -~~~~~~~~~~~~~~~~~~~~ +-------------------- Current state and shortcomings -++++++++++++++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All Ganeti daemons are run under the user root. This is not ideal from a security perspective as for possible exploitation of any daemon the user @@ -694,7 +549,7 @@ side effects, like letting the user run some ``gnt-*`` commands if one is in the same group. Implementation -++++++++++++++ +~~~~~~~~~~~~~~ For Ganeti 2.2 the implementation will be focused on a the RAPI daemon only. This involves changes to ``daemons.py`` so it's possible to drop @@ -710,13 +565,13 @@ and then drop privileges before contacting the master daemon. Feature changes ---------------- +=============== KVM Security -~~~~~~~~~~~~ +------------ Current state and shortcomings -++++++++++++++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Currently all kvm processes run as root. Taking ownership of the hypervisor process, from inside a virtual machine, would mean a full @@ -725,7 +580,7 @@ authentication secrets, full access to all running instances, and the option of subverting other basic services on the cluster (eg: ssh). Proposed changes -++++++++++++++++ +~~~~~~~~~~~~~~~~ We would like to decrease the surface of attack available if an hypervisor is compromised. We can do so adding different features to @@ -734,7 +589,7 @@ possibilities, in the absence of a local privilege escalation attack, to subvert the node. Dropping privileges in kvm to a single user (easy) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +++++++++++++++++++++++++++++++++++++++++++++++++++ By passing the ``-runas`` option to kvm, we can make it drop privileges. The user can be chosen by an hypervisor parameter, so that each instance @@ -761,7 +616,7 @@ But the following would remain an option: - read unprotected data on the node filesystem Running kvm in a chroot (slightly harder) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ++++++++++++++++++++++++++++++++++++++++++ By passing the ``-chroot`` option to kvm, we can restrict the kvm process in its own (possibly empty) root directory. We need to set this @@ -784,7 +639,7 @@ It would still be possible though to: Running kvm with a pool of users (slightly harder) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +++++++++++++++++++++++++++++++++++++++++++++++++++ If rather than passing a single user as an hypervisor parameter, we have a pool of useable ones, we can dynamically choose a free one to use and @@ -795,7 +650,7 @@ This would mean interfering between machines would be impossible, and can still be combined with the chroot benefits. Running iptables rules to limit network interaction (easy) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ These don't need to be handled by Ganeti, but we can ship examples. If the users used to run VMs would be blocked from sending some or all @@ -808,7 +663,7 @@ we can properly apply, without limiting the instance legitimate traffic. Running kvm inside a container (even harder) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +++++++++++++++++++++++++++++++++++++++++++++ Recent linux kernels support different process namespaces through control groups. PIDs, users, filesystems and even network interfaces can @@ -820,7 +675,7 @@ interface, thus reducing performance, so we may want to avoid that, and just rely on iptables. Implementation plan -+++++++++++++++++++ +~~~~~~~~~~~~~~~~~~~ We will first implement dropping privileges for kvm processes as a single user, and most probably backport it to 2.1. Then we'll ship @@ -831,13 +686,58 @@ kvm processes, and extend the user limitation to use a user pool. Finally we'll look into namespaces and containers, although that might slip after the 2.2 release. +New OS states +------------- + +Separate from the OS external changes, described below, we'll add some +internal changes to the OS. + +Current state and shortcomings +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are two issues related to the handling of the OSes. + +First, it's impossible to disable an OS for new instances, since that +will also break reinstallations and renames of existing instances. To +phase out an OS definition, without actually having to modify the OS +scripts, it would be ideal to be able to restrict new installations but +keep the rest of the functionality available. + +Second, ``gnt-instance reinstall --select-os`` shows all the OSes +available on the clusters. Some OSes might exist only for debugging and +diagnose, and not for end-user availability. For this, it would be +useful to "hide" a set of OSes, but keep it otherwise functional. + +Proposed changes +~~~~~~~~~~~~~~~~ + +Two new cluster-level attributes will be added, holding the list of OSes +hidden from the user and respectively the list of OSes which are +blacklisted from new installations. + +These lists will be modifiable via ``gnt-os modify`` (implemented via +``OpSetClusterParams``), such that even not-yet-existing OSes can be +preseeded into a given state. + +For the hidden OSes, they are fully functional except that they are not +returned in the default OS list (as computed via ``OpDiagnoseOS``), +unless the hidden state is requested. + +For the blacklisted OSes, they are also not shown (unless the +blacklisted state is requested), and they are also prevented from +installation via ``OpCreateInstance`` (in create mode). + +Both these attributes are per-OS, not per-variant. Thus they apply to +all of an OS' variants, and it's impossible to blacklist or hide just +one variant. Further improvements might allow a given OS variant to be +blacklisted, as opposed to whole OSes. External interface changes --------------------------- +========================== OS API -~~~~~~ +------ The OS variants implementation in Ganeti 2.1 didn't prove to be useful enough to alleviate the need to hack around the Ganeti API in order to @@ -856,7 +756,7 @@ These changes to the OS API will bump the API version to 20. OS version -++++++++++ +~~~~~~~~~~ A new ``os_version`` file will be supported by Ganeti. This file is not required, but if existing, its contents will be checked for consistency @@ -870,14 +770,14 @@ import/export scripts must increase the version, since they break intra-cluster migration. Parameters -++++++++++ +~~~~~~~~~~ The interface between Ganeti and the OS scripts will be based on environment variables, and as such the parameters and their values will need to be valid in this context. Names -^^^^^ ++++++ The parameter names will be declared in a new file, ``parameters.list``, together with a one-line documentation (whitespace-separated). Example:: @@ -896,7 +796,7 @@ line interface in lowercased form; as such, there shouldn't be any two parameters which differ in case only. Values -^^^^^^ +++++++ The values of the parameters are, from Ganeti's point of view, completely freeform. If a given parameter has, from the OS' point of @@ -917,7 +817,7 @@ the value space). Environment variables -+++++++++++++++++++++ +^^^^^^^^^^^^^^^^^^^^^ The parameters will be exposed in the environment upper-case and prefixed with the string ``OSP_``. For example, a parameter declared in diff --git a/doc/rapi.rst b/doc/rapi.rst index a8732748035e586aae7150861a9e4aa749f3c66c..d3c0dfa9ff1d22eee1374104691bf73b0788bd7e 100644 --- a/doc/rapi.rst +++ b/doc/rapi.rst @@ -1006,10 +1006,15 @@ It supports the following commands: ``POST``. ``POST`` ~~~~~~~~ -No parameters are required, but the bool parameter ``live`` can be set -to use live migration (if available). +If no mode is explicitly specified, each instances' hypervisor default +migration mode will be used. Query parameters: + +``live`` (bool) + If set, use live migration if available. +``mode`` (string) + Sets migration mode, ``live`` for live migration and ``non-live`` for + non-live migration. Supported by Ganeti 2.2 and above. - migrate?live=[0|1] ``/2/nodes/[node_name]/role`` +++++++++++++++++++++++++++++ diff --git a/lib/backend.py b/lib/backend.py index ac5f93b4a0699af111843ac826f34daea0f50538..4ef08ead73209e081d90688d696270a887c09ce2 100644 --- a/lib/backend.py +++ b/lib/backend.py @@ -1902,7 +1902,7 @@ def OSFromDisk(name, base_dir=None): @raise RPCFail: if we don't find a valid OS """ - name_only = name.split("+", 1)[0] + name_only = objects.OS.GetName(name) status, payload = _TryOSFromDisk(name_only, base_dir) if not status: @@ -1937,9 +1937,8 @@ def OSCoreEnv(os_name, inst_os, os_params, debug=0): # OS variants if api_version >= constants.OS_API_V15: - try: - variant = os_name.split('+', 1)[1] - except IndexError: + variant = objects.OS.GetVariant(os_name) + if not variant: variant = inst_os.supported_variants[0] result['OS_VARIANT'] = variant @@ -2489,7 +2488,7 @@ def ValidateOS(required, osname, checks, osparams): _Fail("Unknown checks required for OS %s: %s", osname, set(checks).difference(constants.OS_VALIDATE_CALLS)) - name_only = osname.split("+", 1)[0] + name_only = objects.OS.GetName(osname) status, tbv = _TryOSFromDisk(name_only, None) if not status: diff --git a/lib/cli.py b/lib/cli.py index e339c4f98af0ac0109e37b81cd7099cf422461e1..5f8ad06c7272ed0aa9d5619fc55d6bbbb38f143d 100644 --- a/lib/cli.py +++ b/lib/cli.py @@ -52,6 +52,7 @@ __all__ = [ "AUTO_PROMOTE_OPT", "AUTO_REPLACE_OPT", "BACKEND_OPT", + "BLK_OS_OPT", "CLEANUP_OPT", "CLUSTER_DOMAIN_SECRET_OPT", "CONFIRM_OPT", @@ -73,6 +74,7 @@ __all__ = [ "FORCE_OPT", "FORCE_VARIANT_OPT", "GLOBAL_FILEDIR_OPT", + "HID_OS_OPT", "HVLIST_OPT", "HVOPTS_OPT", "HYPERVISOR_OPT", @@ -1066,6 +1068,15 @@ PRIORITY_OPT = cli_option("--priority", default=None, dest="priority", choices=_PRIONAME_TO_VALUE.keys(), help="Priority for opcode processing") +HID_OS_OPT = cli_option("--hidden", dest="hidden", + type="bool", default=None, metavar=_YORNO, + help="Sets the hidden flag on the OS") + +BLK_OS_OPT = cli_option("--blacklisted", dest="blacklisted", + type="bool", default=None, metavar=_YORNO, + help="Sets the blacklisted flag on the OS") + + #: Options provided by all commands COMMON_OPTS = [DEBUG_OPT] diff --git a/lib/cmdlib.py b/lib/cmdlib.py index 1ed93f205c188d5d253052e191858ff22e8fa92f..399aca4bb3b831fd4665776d7db3aadc5d4b47ee 100644 --- a/lib/cmdlib.py +++ b/lib/cmdlib.py @@ -154,6 +154,13 @@ def _TDict(val): return isinstance(val, dict) +def _TIsLength(size): + """Check is the given container is of the given size. + + """ + return lambda container: len(container) == size + + # Combinator types def _TAnd(*args): """Combine multiple functions using an AND operation. @@ -173,6 +180,13 @@ def _TOr(*args): return fn +def _TMap(fn, test): + """Checks that a modified version of the argument passes the given test. + + """ + return lambda val: test(fn(val)) + + # Type aliases #: a non-empty string @@ -1088,9 +1102,8 @@ def _CheckOSVariant(os_obj, name): """ if not os_obj.supported_variants: return - try: - variant = name.split("+", 1)[1] - except IndexError: + variant = objects.OS.GetVariant(name) + if not variant: raise errors.OpPrereqError("OS name must include a variant", errors.ECODE_INVAL) @@ -2607,6 +2620,16 @@ class LUSetClusterParams(LogicalUnit): ("drbd_helper", None, _TOr(_TString, _TNone)), ("default_iallocator", None, _TMaybeString), ("reserved_lvs", None, _TOr(_TListOf(_TNonEmptyString), _TNone)), + ("hidden_oss", None, _TOr(_TListOf(\ + _TAnd(_TList, + _TIsLength(2), + _TMap(lambda v: v[0], _TElemOf(constants.DDMS_VALUES)))), + _TNone)), + ("blacklisted_oss", None, _TOr(_TListOf(\ + _TAnd(_TList, + _TIsLength(2), + _TMap(lambda v: v[0], _TElemOf(constants.DDMS_VALUES)))), + _TNone)), ] REQ_BGL = False @@ -2880,6 +2903,30 @@ class LUSetClusterParams(LogicalUnit): if self.op.reserved_lvs is not None: self.cluster.reserved_lvs = self.op.reserved_lvs + def helper_oss(aname, mods, desc): + lst = getattr(self.cluster, aname) + for key, val in mods: + if key == constants.DDM_ADD: + if val in lst: + feedback_fn("OS %s already in %s, ignoring", val, desc) + else: + lst.append(val) + elif key == constants.DDM_REMOVE: + if val in lst: + lst.remove(val) + else: + feedback_fn("OS %s not found in %s, ignoring", val, desc) + else: + raise errors.ProgrammerError("Invalid modification '%s'" % key) + + if self.op.hidden_oss: + helper_oss("hidden_oss", self.op.hidden_oss, + "hidden OS list") + + if self.op.blacklisted_oss: + helper_oss("blacklisted_oss", self.op.blacklisted_oss, + "blacklisted OS list") + self.cfg.Update(self.cluster, feedback_fn) @@ -3068,9 +3115,12 @@ class LUDiagnoseOS(NoHooksLU): ("names", _EmptyList, _TListOf(_TNonEmptyString)), ] REQ_BGL = False + _HID = "hidden" + _BLK = "blacklisted" + _VLD = "valid" _FIELDS_STATIC = utils.FieldSet() - _FIELDS_DYNAMIC = utils.FieldSet("name", "valid", "node_status", "variants", - "parameters", "api_versions") + _FIELDS_DYNAMIC = utils.FieldSet("name", _VLD, "node_status", "variants", + "parameters", "api_versions", _HID, _BLK) def CheckArguments(self): if self.op.names: @@ -3137,8 +3187,10 @@ class LUDiagnoseOS(NoHooksLU): node_data = self.rpc.call_os_diagnose(valid_nodes) pol = self._DiagnoseByOS(node_data) output = [] + cluster = self.cfg.GetClusterInfo() - for os_name, os_data in pol.items(): + for os_name in utils.NiceSort(pol.keys()): + os_data = pol[os_name] row = [] valid = True (variants, params, api_versions) = null_state = (set(), set(), set()) @@ -3157,10 +3209,17 @@ class LUDiagnoseOS(NoHooksLU): params.intersection_update(node_params) api_versions.intersection_update(node_api) + is_hid = os_name in cluster.hidden_oss + is_blk = os_name in cluster.blacklisted_oss + if ((self._HID not in self.op.output_fields and is_hid) or + (self._BLK not in self.op.output_fields and is_blk) or + (self._VLD not in self.op.output_fields and not valid)): + continue + for field in self.op.output_fields: if field == "name": val = os_name - elif field == "valid": + elif field == self._VLD: val = valid elif field == "node_status": # this is just a copy of the dict @@ -3168,11 +3227,15 @@ class LUDiagnoseOS(NoHooksLU): for node_name, nos_list in os_data.items(): val[node_name] = nos_list elif field == "variants": - val = list(variants) + val = utils.NiceSort(list(variants)) elif field == "parameters": val = list(params) elif field == "api_versions": val = list(api_versions) + elif field == self._HID: + val = is_hid + elif field == self._BLK: + val = is_blk else: raise errors.ParameterError(field) row.append(val) @@ -4918,7 +4981,7 @@ class LURenameInstance(LogicalUnit): new_name = self.op.new_name if self.op.name_check: hostname = netutils.GetHostname(name=new_name) - new_name = hostname.name + new_name = self.op.new_name = hostname.name if (self.op.ip_check and netutils.TcpPing(hostname.ip, constants.DEFAULT_NODED_PORT)): raise errors.OpPrereqError("IP %s of instance %s already in use" % @@ -6639,6 +6702,10 @@ class LUCreateInstance(LogicalUnit): if self.op.os_type is None: raise errors.OpPrereqError("No guest OS specified", errors.ECODE_INVAL) + if self.op.os_type in self.cfg.GetClusterInfo().blacklisted_oss: + raise errors.OpPrereqError("Guest OS '%s' is not allowed for" + " installation" % self.op.os_type, + errors.ECODE_STATE) if self.op.disk_template is None: raise errors.OpPrereqError("No disk template specified", errors.ECODE_INVAL) diff --git a/lib/constants.py b/lib/constants.py index 114cf91ba45fb86e10b3a258f9ab3a71e28e6a58..1782809ba8c49e03bb77acedd254fe4b0e9253ef 100644 --- a/lib/constants.py +++ b/lib/constants.py @@ -427,8 +427,9 @@ INISECT_BEP = "backend" INISECT_OSP = "os" # dynamic device modification -DDM_ADD = 'add' -DDM_REMOVE = 'remove' +DDM_ADD = "add" +DDM_REMOVE = "remove" +DDMS_VALUES = frozenset([DDM_ADD, DDM_REMOVE]) # common exit codes EXIT_SUCCESS = 0 diff --git a/lib/hypervisor/hv_kvm.py b/lib/hypervisor/hv_kvm.py index d2ac24c5ecf8830ad088984c7347ed14ff258e18..ec0cd455caf96e6e1ba6674a963ce21c5dd41ccc 100644 --- a/lib/hypervisor/hv_kvm.py +++ b/lib/hypervisor/hv_kvm.py @@ -213,6 +213,8 @@ class KVMHypervisor(hv_base.BaseHypervisor): _MIGRATION_INFO_MAX_BAD_ANSWERS = 5 _MIGRATION_INFO_RETRY_DELAY = 2 + _VERSION_RE = re.compile(r"\b(\d+)\.(\d+)\.(\d+)\b") + ANCILLARY_FILES = [ _KVM_NETWORK_SCRIPT, ] @@ -816,6 +818,21 @@ class KVMHypervisor(hv_base.BaseHypervisor): return result + @classmethod + def _GetKVMVersion(cls): + """Return the installed KVM version + + @return: (version, v_maj, v_min, v_rev), or None + + """ + result = utils.RunCmd([constants.KVM_PATH, "--help"]) + if result.failed: + return None + match = cls._VERSION_RE.search(result.output.splitlines()[0]) + if not match: + return None + return (match.group(0), match.group(1), match.group(2), match.group(3)) + def StopInstance(self, instance, force=False, retry=False, name=None): """Stop an instance. diff --git a/lib/objects.py b/lib/objects.py index 0be44fc9c29d5b71c6db283fb05c2c7fdb58a72c..827f680dec9bbafaf5592ea9e40f90bb5d21d52b 100644 --- a/lib/objects.py +++ b/lib/objects.py @@ -1,7 +1,7 @@ # # -# Copyright (C) 2006, 2007, 2010 Google Inc. +# Copyright (C) 2006, 2007, 2008, 2009, 2010 Google Inc. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -876,6 +876,9 @@ class OS(ConfigObject): @ivar supported_parameters: a list of tuples, name and description, containing the supported parameters by this OS + @type VARIANT_DELIM: string + @cvar VARIANT_DELIM: the variant delimiter + """ __slots__ = [ "name", @@ -890,6 +893,41 @@ class OS(ConfigObject): "supported_parameters", ] + VARIANT_DELIM = "+" + + @classmethod + def SplitNameVariant(cls, name): + """Splits the name into the proper name and variant. + + @param name: the OS (unprocessed) name + @rtype: list + @return: a list of two elements; if the original name didn't + contain a variant, it's returned as an empty string + + """ + nv = name.split(cls.VARIANT_DELIM, 1) + if len(nv) == 1: + nv.append("") + return nv + + @classmethod + def GetName(cls, name): + """Returns the proper name of the os (without the variant). + + @param name: the OS (unprocessed) name + + """ + return cls.SplitNameVariant(name)[0] + + @classmethod + def GetVariant(cls, name): + """Returns the variant the os (without the base name). + + @param name: the OS (unprocessed) name + + """ + return cls.SplitNameVariant(name)[1] + class Node(TaggableObject): """Config object representing a node.""" @@ -966,6 +1004,8 @@ class Cluster(TaggableObject): "uid_pool", "default_iallocator", "primary_ip_family", + "hidden_oss", + "blacklisted_oss", ] + _TIMESTAMPS + _UUID def UpgradeConfig(self): @@ -1031,6 +1071,13 @@ class Cluster(TaggableObject): if self.reserved_lvs is None: self.reserved_lvs = [] + # hidden and blacklisted operating systems added before 2.2.1 + if self.hidden_oss is None: + self.hidden_oss = [] + + if self.blacklisted_oss is None: + self.blacklisted_oss = [] + # primary_ip_family added before 2.3 if self.primary_ip_family is None: self.primary_ip_family = AF_INET diff --git a/lib/opcodes.py b/lib/opcodes.py index 71c01c3e11e5ac6c9378ce8117ce4e8ddc46b66e..d314fc8a915f9b552c67582b9aa69a6476423475 100644 --- a/lib/opcodes.py +++ b/lib/opcodes.py @@ -316,6 +316,8 @@ class OpSetClusterParams(OpCode): "remove_uids", "default_iallocator", "reserved_lvs", + "hidden_oss", + "blacklisted_oss", ] diff --git a/lib/rapi/rlib2.py b/lib/rapi/rlib2.py index eed96400e1795af56774b913048a6b5ca355214a..0d8367ae45a8765fc7f4723c8bac8156da89ae51 100644 --- a/lib/rapi/rlib2.py +++ b/lib/rapi/rlib2.py @@ -148,8 +148,7 @@ class R_2_os(baserlib.R_Generic): """ cl = baserlib.GetClient() - op = opcodes.OpDiagnoseOS(output_fields=["name", "valid", "variants"], - names=[]) + op = opcodes.OpDiagnoseOS(output_fields=["name", "variants"], names=[]) job_id = baserlib.SubmitJob([op], cl) # we use custom feedback function, instead of print we log the status result = cli.PollJob(job_id, cl, feedback_fn=baserlib.FeedbackFn) @@ -159,9 +158,8 @@ class R_2_os(baserlib.R_Generic): raise http.HttpBadGateway(message="Can't get OS list") os_names = [] - for (name, valid, variants) in diagnose_data: - if valid: - os_names.extend(cli.CalculateOSNames(name, variants)) + for (name, variants) in diagnose_data: + os_names.extend(cli.CalculateOSNames(name, variants)) return os_names diff --git a/man/gnt-os.sgml b/man/gnt-os.sgml index 7cf9d4d84d7dd8daa2b8b07cbeb2b2f366f7abd6..4e7d5bc3e03f5237361a0a7863e6bd77278c91b3 100644 --- a/man/gnt-os.sgml +++ b/man/gnt-os.sgml @@ -2,7 +2,7 @@ <!-- Fill in your name for FIRSTNAME and SURNAME. --> <!-- Please adjust the date whenever revising the manpage. --> - <!ENTITY dhdate "<date>June 08, 2010</date>"> + <!ENTITY dhdate "<date>September 20, 2010</date>"> <!-- SECTION should be 1-8, maybe w/ subsection other parameters are allowed: see man(7), man(1). --> <!ENTITY dhsection "<manvolnum>8</manvolnum>"> @@ -69,6 +69,11 @@ as an option. </para> + <para> + Note that hidden or blacklisted OSes are not displayed by this + command, use <command>diagnose</command> for showing those. + </para> + <cmdsynopsis> <command>diagnose</command> </cmdsynopsis> @@ -100,14 +105,30 @@ </cmdsynopsis> <para> - This command will allow you to modify OS parameters. At the moment - we just support per-os-hypervisor settings. You can run modify + This command will allow you to modify OS parameters. + + </para> + + <para> + To modify the per-OS hypervisor parameters (which override the + global hypervisor parameters), you can run modify <option>-H</option> with the same syntax as in <command>gnt-cluster init</command> to override default hypervisor parameters of the cluster for specified <replaceable>OS</replaceable> argument. </para> + <para> + To modify the hidden and blacklisted states of an OS, pass the + options <option>--hidden <replaceable>yes|no</replaceable></option>, + or respectively <option>--blacklisted ...</option>. The 'hidden' + state means that an OS won't be listed by default in the OS + list, but is available for installation. The 'blacklisted' state + means that the OS is not listed and is also not allowed for new + instance creations (but can be used for reinstalling old + instances). + </para> + <para> Note: The <replaceable>OS</replaceable> doesn't have to exists. This allows preseeding the settings for diff --git a/qa/ganeti-qa.py b/qa/ganeti-qa.py index 4b784901eeb791f38eb1c1a9c82a72ea14b01b6c..070a1b34a6c26f9c912782703d4749c7a0ccfbb5 100755 --- a/qa/ganeti-qa.py +++ b/qa/ganeti-qa.py @@ -1,7 +1,7 @@ #!/usr/bin/python # -# Copyright (C) 2007 Google Inc. +# Copyright (C) 2007, 2008, 2009, 2010 Google Inc. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -145,6 +145,7 @@ def RunOsTests(): RunTest(qa_os.TestOsPartiallyValid) RunTest(qa_os.TestOsModifyValid) RunTest(qa_os.TestOsModifyInvalid) + RunTest(qa_os.TestOsStates) def RunCommonInstanceTests(instance): diff --git a/qa/qa_os.py b/qa/qa_os.py index 28c2c73600ea09002fb0919d764035fb57c40fe6..143fedc53df3eb575659e9804daeabcd13767202 100644 --- a/qa/qa_os.py +++ b/qa/qa_os.py @@ -1,7 +1,7 @@ # # -# Copyright (C) 2007 Google Inc. +# Copyright (C) 2007, 2008, 2009, 2010 Google Inc. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -75,6 +75,19 @@ def _TestOsModify(hvp_dict, expected_result=0): utils.ShellQuoteArgs(cmd)).wait(), expected_result) +def _TestOsStates(): + """gnt-os modify, more stuff""" + master = qa_config.GetMasterNode() + + cmd = ["gnt-os", "modify"] + + for param in ["hidden", "blacklisted"]: + for val in ["yes", "no"]: + new_cmd = cmd + ["--%s" % param, val, _TEMP_OS_NAME] + AssertEqual(StartSSH(master["primary"], + utils.ShellQuoteArgs(new_cmd)).wait(), 0) + + def _SetupTempOs(node, dir, valid): """Creates a temporary OS definition on the given node. @@ -181,3 +194,9 @@ def TestOsModifyInvalid(): } return _TestOsModify(hv_dict, 1) + + +def TestOsStates(): + """Testing OS states""" + + return _TestOsStates() diff --git a/scripts/gnt-instance b/scripts/gnt-instance index bbe570ce2842451779f6ed8a47b03481c8311e6b..1c62463df4e72005aafbb8a65357bfa4ad79d630 100755 --- a/scripts/gnt-instance +++ b/scripts/gnt-instance @@ -524,8 +524,7 @@ def ReinstallInstance(opts, args): # second, if requested, ask for an OS if opts.select_os is True: - op = opcodes.OpDiagnoseOS(output_fields=["name", "valid", "variants"], - names=[]) + op = opcodes.OpDiagnoseOS(output_fields=["name", "variants"], names=[]) result = SubmitOpCode(op, opts=opts) if not result: @@ -535,12 +534,11 @@ def ReinstallInstance(opts, args): ToStdout("Available OS templates:") number = 0 choices = [] - for (name, valid, variants) in result: - if valid: - for entry in CalculateOSNames(name, variants): - ToStdout("%3s: %s", number, entry) - choices.append(("%s" % number, entry, entry)) - number += 1 + for (name, variants) in result: + for entry in CalculateOSNames(name, variants): + ToStdout("%3s: %s", number, entry) + choices.append(("%s" % number, entry, entry)) + number += 1 choices.append(('x', 'exit', 'Exit gnt-instance reinstall')) selected = AskUser("Enter OS template number (or x to abort):", diff --git a/scripts/gnt-job b/scripts/gnt-job index df2ced97dbe554698a616e77b1ab723238a36a5d..80dbfd8b8bb530d6592029c1f0199a2689d6709c 100755 --- a/scripts/gnt-job +++ b/scripts/gnt-job @@ -290,9 +290,10 @@ def ShowJobs(opts, args): else: format_msg(3, "No processing end time") format_msg(3, "Input fields:") - for key, val in opcode.iteritems(): + for key in utils.NiceSort(opcode.keys()): if key == "OP_ID": continue + val = opcode[key] if isinstance(val, (tuple, list)): val = ",".join([str(item) for item in val]) format_msg(4, "%s: %s" % (key, val)) diff --git a/scripts/gnt-os b/scripts/gnt-os index 9b30f2d4566313efd62d3bfa1e4954b6003c4e22..48d6e3003b3182da1a0b7590d2ff2852f5c1edb7 100755 --- a/scripts/gnt-os +++ b/scripts/gnt-os @@ -44,8 +44,7 @@ def ListOS(opts, args): @return: the desired exit code """ - op = opcodes.OpDiagnoseOS(output_fields=["name", "valid", "variants"], - names=[]) + op = opcodes.OpDiagnoseOS(output_fields=["name", "variants"], names=[]) result = SubmitOpCode(op, opts=opts) if not result: @@ -58,9 +57,8 @@ def ListOS(opts, args): headers = None os_names = [] - for (name, valid, variants) in result: - if valid: - os_names.extend([[n] for n in CalculateOSNames(name, variants)]) + for (name, variants) in result: + os_names.extend([[n] for n in CalculateOSNames(name, variants)]) data = GenerateTable(separator=None, headers=headers, fields=["name"], data=os_names, units=None) @@ -82,7 +80,8 @@ def ShowOSInfo(opts, args): """ op = opcodes.OpDiagnoseOS(output_fields=["name", "valid", "variants", - "parameters", "api_versions"], + "parameters", "api_versions", + "blacklisted", "hidden"], names=[]) result = SubmitOpCode(op, opts=opts) @@ -92,7 +91,7 @@ def ShowOSInfo(opts, args): do_filter = bool(args) - for (name, valid, variants, parameters, api_versions) in result: + for (name, valid, variants, parameters, api_versions, blk, hid) in result: if do_filter: if name not in args: continue @@ -100,6 +99,8 @@ def ShowOSInfo(opts, args): args.remove(name) ToStdout("%s:", name) ToStdout(" - valid: %s", valid) + ToStdout(" - hidden: %s", hid) + ToStdout(" - blacklisted: %s", blk) if valid: ToStdout(" - API versions:") for version in sorted(api_versions): @@ -148,7 +149,8 @@ def DiagnoseOS(opts, args): """ op = opcodes.OpDiagnoseOS(output_fields=["name", "valid", "variants", - "node_status"], names=[]) + "node_status", "hidden", + "blacklisted"], names=[]) result = SubmitOpCode(op, opts=opts) if not result: @@ -157,7 +159,7 @@ def DiagnoseOS(opts, args): has_bad = False - for os_name, _, os_variants, node_data in result: + for os_name, _, os_variants, node_data, hid, blk in result: nodes_valid = {} nodes_bad = {} nodes_hidden = {} @@ -173,6 +175,7 @@ def DiagnoseOS(opts, args): else: max_os_api = 0 fo_msg += " [no API versions declared]" + if max_os_api >= constants.OS_API_V15: if fo_variants: fo_msg += " [variants: %s]" % utils.CommaJoin(fo_variants) @@ -210,7 +213,12 @@ def DiagnoseOS(opts, args): for msg in nodes_hidden[node_name]: ToStdout(msg) - ToStdout("OS: %s [global status: %s]", os_name, status) + st_msg = "OS: %s [global status: %s]" % (os_name, status) + if hid: + st_msg += " [hidden]" + if blk: + st_msg += " [blacklisted]" + ToStdout(st_msg) if os_variants: ToStdout(" Variants: [%s]" % utils.CommaJoin(os_variants)) _OutputPerNodeOSStatus(nodes_valid) @@ -242,19 +250,31 @@ def ModifyOS(opts, args): else: osp = None - if not (os_hvp or osp): + if opts.hidden is not None: + if opts.hidden: + ohid = [(constants.DDM_ADD, os)] + else: + ohid = [(constants.DDM_REMOVE, os)] + else: + ohid = None + + if opts.blacklisted is not None: + if opts.blacklisted: + oblk = [(constants.DDM_ADD, os)] + else: + oblk = [(constants.DDM_REMOVE, os)] + else: + oblk = None + + if not (os_hvp or osp or ohid or oblk): ToStderr("At least one of OS parameters or hypervisor parameters" " must be passed") return 1 - op = opcodes.OpSetClusterParams(vg_name=None, - enabled_hypervisors=None, - hvparams=None, - beparams=None, - nicparams=None, - candidate_pool_size=None, - os_hvp=os_hvp, - osparams=osp) + op = opcodes.OpSetClusterParams(os_hvp=os_hvp, + osparams=osp, + hidden_oss=ohid, + blacklisted_oss=oblk) SubmitOpCode(op, opts=opts) return 0 @@ -273,7 +293,8 @@ commands = { "operating systems"), 'modify': ( ModifyOS, ARGS_ONE_OS, - [HVLIST_OPT, OSPARAMS_OPT, DRY_RUN_OPT, PRIORITY_OPT], + [HVLIST_OPT, OSPARAMS_OPT, DRY_RUN_OPT, PRIORITY_OPT, + HID_OS_OPT, BLK_OS_OPT], "", "Modify the OS parameters"), } diff --git a/test/ganeti.objects_unittest.py b/test/ganeti.objects_unittest.py index 6b5cfe716a3988c000b9d1dd1768626f768b226c..be898f582db29c6c8b1a8d8b05e4a5b5d204efc9 100755 --- a/test/ganeti.objects_unittest.py +++ b/test/ganeti.objects_unittest.py @@ -1,7 +1,7 @@ #!/usr/bin/python # -# Copyright (C) 2006, 2007, 2008 Google Inc. +# Copyright (C) 2006, 2007, 2008, 2010 Google Inc. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -142,5 +142,21 @@ class TestClusterObject(unittest.TestCase): self.fake_cl.FillHV(fake_inst)) +class TestOS(unittest.TestCase): + ALL_DATA = [ + "debootstrap", + "debootstrap+default", + "debootstrap++default", + ] + + def testSplitNameVariant(self): + for name in self.ALL_DATA: + self.assertEqual(len(objects.OS.SplitNameVariant(name)), 2) + + def testVariant(self): + self.assertEqual(objects.OS.GetVariant("debootstrap"), "") + self.assertEqual(objects.OS.GetVariant("debootstrap+default"), "default") + + if __name__ == '__main__': testutils.GanetiTestProgram() diff --git a/tools/burnin b/tools/burnin index e5cce5089af08aeb619432c4ee84b8f8e210c456..3ce415241008609b16ccf9ee3b56cd03d94cbd48 100755 --- a/tools/burnin +++ b/tools/burnin @@ -510,16 +510,16 @@ class Burner(object): Err(msg, exit_code=err_code) self.nodes = [data[0] for data in result if not (data[1] or data[2])] - op_diagnose = opcodes.OpDiagnoseOS(output_fields=["name", "valid", - "variants"], names=[]) + op_diagnose = opcodes.OpDiagnoseOS(output_fields=["name", "variants"], + names=[]) result = self.ExecOp(True, op_diagnose) if not result: Err("Can't get the OS list") found = False - for (name, valid, variants) in result: - if valid and self.opts.os in cli.CalculateOSNames(name, variants): + for (name, variants) in result: + if self.opts.os in cli.CalculateOSNames(name, variants): found = True break