- 12 Apr, 2010 2 commits
-
-
Iustin Pop authored
ExpandNames holds too much non-locking code (first LU to be converted to ExpandNames, and we didn't have CheckArguments at that poin), and this patch moves the checks that are lock-independent to CheckArguments. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This fixes an old 'FIXME' entry. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 08 Apr, 2010 1 commit
-
-
Iustin Pop authored
This will be used to conditionally enable the watcher node maintenance feature. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 23 Mar, 2010 4 commits
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Abstract the growable disk types in a ganeti constants, and only run disk grow, from burnin, on them. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Per issue 90, current cluster verify is very very brittle. It's one of the oldest pieces of code, with only additions without cleanups over the last years. Among its problems: - data initialization interspersed with verification of RPC results, leading to non-initialized data for some branches - due to the above, we order strictly some checks and we have the case where a bad node time result will skip checking of node volumes - many many local variables, with each new check adding a new dict, leading to a spaghetti of dicts in the main Exec function - monolithic code, both Exec() and _NodeVerify() do a lot of independent checks This patch does an imperfect rewrite, but at least we gain: - a clear infrastructure for adding more checks (the new NodeImage class, with it's clear and documented fields), and removal of most per-node dicts from the Exec() function - the new NodeImage object should allow better type safety, e.g. by allowing pylint to check the actual object attributes rather than strings as dict keys - a-priori initialization of data fields, eliminating the need to introduce dependencies between checks - per-result-key status field, allowing elimination of duplicate error messages (where we want) - split of most independent checks into separate functions, for greater clarity The new code, being new will probably introduce for the short term more bugs than it removes. However, it should offer a much better way for extending cluster verify in the future. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 17 Mar, 2010 3 commits
-
-
Iustin Pop authored
This is a simple patch that adds the no-install mode for instance creation, allowing import from foreign source of the actual OS (instead of requiring the preparation of data in a form expected by the import scripts). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch modifies LUSetInstanceParms to allow OS name changes, without reinstallation, in case an OS gets renamed on-disk. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch moves the node-has-os checks to a separate function. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 16 Mar, 2010 1 commit
-
-
Iustin Pop authored
The current check on whether we require auto_promote or not is wrong, as we check whether we will have exactly the correct number of master candidates left. But it is fine if we have more (e.g. when CPS=10 and mc_remaning=19) than the current number, and in that case we shouldn't require auto promotion. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 15 Mar, 2010 8 commits
-
-
Michael Hanselmann authored
Currently, the ganeti-confd's HMAC key is called “cluster HMAC key” or simply “HMAC key” everywhere. With the implementation of inter-cluster instance moves, another HMAC key will be introduced for signing critical data. They can not be the same, so this patch clarifies the purpose of the “cluster HMAC key” by renaming it. The actual file name is not changed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This is much simpler than the opposite, with fewer possibilities of failures. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch adds a new mode to instance modify, the changing of the disk template. For now only plain to drbd conversion is supported, and the new secondary node must be specified manually (no iallocator support). The procedure for conversion works as follows: - a completely new disk template is created, matching the count, size and mode of the instance's current disks - we create manually (not via _CreateDisks) all the missing volumes - we rename on the primary the LVs to the new name - we create manually the DRBD devices Failures during the creation of volumes will leave orphan volumes. Failure during the rename might leave some disks renamed and some not, leading to an inconsistent instance. Once the disks are renamed, we update the instance information and wait for resync. Any failures of the DRBD sync must be manually handled (like a normal failure, e.g. by running replace-disks, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Multiple LUs require that an instance is not running while they operate on the instance (reinstall, rename, modify, recreate disks, deactivate disks). The code to do this check is duplicate many times, and not very consistent (some use call_instance_list, some call_instance_info). The patch moves this check into a separate function that is then reused. The only drawback is that _SafeShutdowInstanceDisks now raises an OpPrereqError (even though it is run during Exec()), but this use case is fine (there are no other modifications in that Exec). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Both create instance and grow disk check the free disk space on nodes using the same, duplicate code. Since we'll need this in other places in the future, we abstract the check into a new function. The patch adjusts the error message to be more in-line with the one for memory checking, and fixes the exception raised for RPC errors. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a simple check, but we'll need it in multiple places. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This new mode, valid only for the plain template disk, allows creation of an instance based on existing logical volumes (preserving data), rather than creation of new volumes and OS creation. The new mode works as follows: - instead of size, all disks passed in must have an 'adopt' key, which signifies the LV name to be used - all disks must have this key, or neither should - we check the volume existence, and from the result we fill in the actual size - online (in-use) volumes are not allowed - 'stealing' of another's instance volumes is prevented via reservation of the LV names - during creation, we rename the logical volumes to the standard Ganeti format (based on UUID) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This way, the parameters are available in CheckArguments too. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 12 Mar, 2010 3 commits
-
-
Michael Hanselmann authored
When using pyOpenSSL 0.7 or above, LUClusterVerify will start to show a warning 30 days before a certificate expires. 7 days before the certificate expires, the warning becomes an error. Once expired, LUVerifyCluster will always report an error. The latter is also supported with pyOpenSSL 0.6. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
In case the hypervisor has issues on one node, currently backend.VerifyNode will exit via an exception (two exit paths possible, one via HypervisorError from hypervisor.Verify(), and one via RPCFail from GetInstanceList). This is bad as it invalidates all other checks of that node. This patch catches these two errors and allows the rest of the VerifyNode function to run. This leads to a more complete verify cluster run, for example now only real missing LVs are reported, not all of them. The cluster verify is not perfect as it will skip some tests even if it has data, but this will require a more complete rewrite (see issue 90). Also, the patch fixes and improves some error messages in cmdlib. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 11 Mar, 2010 2 commits
-
-
Iustin Pop authored
In simulate errors mode, the test "ntime_diff is not None" will be ignored, and thus a None value will try to be formatted as %.01f. We workaround this by formatting it before, and then only using %s, which can format a 'None' value. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This adds a validation similar to the one for cluster-wide hypervisor paramters. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 09 Mar, 2010 4 commits
-
-
Iustin Pop authored
The current code in LUSetNodeParms regarding the demotion from master candidate role is complicated and duplicates the code in ConfigWriter, where such decisions should be made. Furthermore, we still cannot demote nodes (not even with force), if other regular nodes exist. This patch adds a new opcode attribute ‘auto_promote’, and changes the decision tree as follows: - if the node will be set to offline or drained or explicitly demoted from master candidate, and this parameter is set, then we lock all nodes in ExpandNames() - later, in CheckPrereq(), if the node is indeed a master candidate, and the future state (as computed via GetMasterCandidateStats with the current node in the exception list) has fewer nodes than it should, and we didn't lock all nodes, we exit with an exception - in Exec, if we locked all nodes, we do a AdjustCandidatePool() run, to ensure nodes are locked as needed (we do it before updating the node to remove a warning, and prevent the situation that if the LU fails between these, we're not left with an inconsistent state) Note that in Exec we run the AdjustCP irrespective of any node state change (just based on lock status), so we might simplify the CheckPrereq even more by not checking the future state, basically requiring auto_promote/lock_all for master candidates, since the case where we have more than needed master candidates is rarer; OTOH, this would prevent manual promotion ahead of time of another node, which is why I didn't choose this way. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The return from LUVerifyCluster should be True (or equivalent) for pass, and False (or equivalent) for fail. The HooksCallBack function uses '1' (= True) when a hook fails, which is exactly the opposite of what we want - it will make failed hooks to reset the result to success, overriding actual failures in cluster verify. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
We need to manually filter out offline nodes before using rpc.call_upload_file and rpc.call_write_ssconf_files, since these method are static (they work without a ConfigWriter instance) and thus do not know which nodes are offline and which are not). Note that we add a new ConfigWriter._UnlockedGetOnlineNodeList() method rather than hardcoding the filtering of online nodes in _WriteConfig. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
This patch implements all modifications to support per-os-hypervisor parameters in the framework. Signed-off-by:
René Nussbaumer <rn@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 08 Mar, 2010 3 commits
-
-
Iustin Pop authored
This patch adds validation of new names used, i.e. at cluster init time, node add time, and instance creation. For instances, especially when using «--no-name-check» (which skips DNS checks), we should validate the give name, and also normalize it (otherwise, we could have two instances named inst1 and Inst1). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Rationale: the file-based storage backend can add/remove files under a certain directory. However, the master node is also controlling the setting of the file-based root directory, so basically it means we can't prevent arbitrary modifications by the master of the node's filesystem. In order to mitigate this for setups where the file-based storage is not used, we introduce a new setting at ./configure time, that controls the enable/disable of file-based storage. Since this is not modifiable by the master (over RPC), it is now possible in this case to prevent unintended modifications of the node's filesystem from the master. The new setting is used in bdev.py to not expose the file-based storage at all, and in cmdlib.py to prevent attempts at creation of such instances. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This passes a full burnin with lots of instances, and should be safe as we mostly to join a known root (various constants) to a run-time variable. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 26 Feb, 2010 1 commit
-
-
Michael Hanselmann authored
LUQueryConfigValues supports multiple output fields. If the client asked for the watcher pause status, it would not get a list, but simply the value. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 25 Feb, 2010 1 commit
-
-
Michael Hanselmann authored
The first argument to _ErrorIf should always be True in this case. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 22 Feb, 2010 6 commits
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a new mode that request a solution for the evacuation of multiple nodes. The external script will be fed a list of names, and is expected to return a list of [instance, new_node(s)] lists, detailing the evacuation path of each instance. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch switches the default result key from 'nodes' to 'result'. The old name is still accepted for backwards-compatiblity, and should be removed in later versions. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the 'name' parameter in the constructor is required (as a non-keyword argument). Since the (to follow) node evac IAllocator mode doesn't have 'name' as a valid argument, we're moving this one into the per-request key, leaving the constructor required arguments more abstract. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This moves the setting of the request member on the in_data, of the request type, and of the branching basef on request type outside of individual functions and directly into the constructor. Since the values we're using externally are identical to the constants.py values, we're also using those directly. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 17 Feb, 2010 1 commit
-
-
Iustin Pop authored
This should have been done in the _ExpandNodeName patch. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-