- 19 Mar, 2012 1 commit
-
-
Iustin Pop authored
If a specific list of groups has been requested, then the code used that, without transforming it to a (frozen)set first, which results in: unsupported operand type(s) for &: 'list' and 'frozenset' Trivial fix is to do that in the 'then' branch. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 25 Jan, 2012 1 commit
-
-
Michael Hanselmann authored
This patch attempts to fix a number of issues with “gnt-cluster verify” in presence of multiple node groups and DRBD8 instances split over nodes in more than one group. - Look up instances in a group only by their primary node (otherwise split instances would be considered when verifying any of their node's groups) - When gathering additional nodes for LV checks, just compare instance's node's groups with the currently verified group instead of comparing against the primary node's group - Exclude nodes in other groups when calculating N+1 errors and checking logical volumes Not directly related, but a small error text is also clarified. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 20 Jan, 2012 1 commit
-
-
Guido Trotter authored
Cleanup just updates the config with the correct location of the instance, or informs of its down status, but never starts it. As such there's no point in checking for enough free memory. Actually this check could prevent a perfectly safe cleanup operation if a node is busy. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 24 Nov, 2011 3 commits
-
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 218f4c3d)
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Commit c50452c3 added an exception when all instances should be evacuated off a node, but did so in a way which made pylint complain about unreachable code. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 23 Nov, 2011 3 commits
-
-
Michael Hanselmann authored
There is a design issue in the iallocator interface which prevents us from doing this. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
When evacuating a node, only an assertion without informative text was used to check if the necessary node locks had been acquired. This was on top of evaluating the list of nodes without having a node group lock, so this was changed as well. Also update some exception messages to include “retry the operation”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list. Removing a node would fail with “too many values to unpack”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 14 Nov, 2011 1 commit
-
-
Vangelis Koukis authored
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() are returned to the pool of free ports when no longer needed: * Return the network_port of an instance when it is removed * Return the port used by a DRBD-based disk when it is removed Signed-off-by:
Vangelis Koukis <vkoukis@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 08 Nov, 2011 1 commit
-
-
Michael Hanselmann authored
If an instance can't be evacuated, only a message would be printed. With this change the operation always aborts. Newly added unittests check for this behaviour. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Nov, 2011 2 commits
-
-
Michael Hanselmann authored
… instead of object with name. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Instances are modified if their disk size doesn't match. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 19 Oct, 2011 1 commit
-
-
Michael Hanselmann authored
If an instance had actually a missing disk, the type check would fail. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Oct, 2011 1 commit
-
-
Michael Hanselmann authored
If a cluster has any non-master-candidate nodes, those don't contain all files (e.g. config.data). With commit aef59ae7 (March 31st, 2011) the logic was changed and subsequently verifying a cluster with non-mc nodes would complain. This patch fixes this issue by changing the algorithm. It also adds an additional check for files which shouldn't exist on a machine. A newly added unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 03 Oct, 2011 1 commit
-
-
Michael Hanselmann authored
Commit 64c7b383 changed the RPC call for verifying SSH connections. Unfortunately this case in adding nodes was missed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 30 Sep, 2011 1 commit
-
-
Michael Hanselmann authored
When verifying a group the code would always check SSH to all nodes in the same group, as well as the first node for every other group. On big clusters this can cause issues since many nodes will try to connect to the first node of another group at the same time. This patch changes the algorithm to choose a different node every time. A unittest for the selection algorithm is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 28 Sep, 2011 2 commits
-
-
Iustin Pop authored
The change to enforce boolean results for cluster verify group opcode missed the HooksCallBack, which uses a very ugly 1/0 logic. Furthermore, the logic is wrong, since it unconditionally resets the verify result to true. The patch is changed to simply treat hook failures as failures, and do nothing for offline/nodes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This reverts to the old behaviour in Ganeti 2.4 and before. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 30 Aug, 2011 1 commit
-
-
Andrea Spadaccini authored
In version 0.21, pylint unified all the disable-* (and enable-*) directives to disable (resp. enable). This leads to a lot of DeprecationWarning being emitted even if one uses the recommended version of pylint (0.21.1, as stated in devnotes.rst). This commit changes all the disable-msg directives to disable. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 26 Aug, 2011 1 commit
-
-
Michael Hanselmann authored
cmdlib: Avoid wrapping using backslash gnt_group: Avoid ** magic using keyword arguments (the “pep8” tool doesn't like the inline comment in this case and will complain about spaces around the “**” operator) Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 25 Aug, 2011 1 commit
-
-
Michael Hanselmann authored
Identified using the “pep8” utility. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 23 Aug, 2011 1 commit
-
-
René Nussbaumer authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit b7d7876b) Conflicts: lib/cmdlib.py (easily fixed)
-
- 19 Aug, 2011 1 commit
-
-
Agata Murawska authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 12 Aug, 2011 4 commits
-
-
Michael Hanselmann authored
- Check if BGL is actually owned - Show group name as feedback Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The original design for query2 specifically excluded locking, but now it's turned out that it would be a good thing to have in watcher. This patch adds a new parameter to OpQuery and enables its use in LUQuery. A missing function is added to LUGroupQuery, a comment clarified in _NodeQuery and all locks declared as shared acquires in the same LU. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This patch removes the list of node groups (not used anymore since commit fcad7225 ) from OpClusterVerifyConfig's result and adds result verification to all OpClusterVerify* opcodes. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This patch moves the logic for verifying the various node groups in a cluster into the master daemon. Job dependencies are used to ensure the configuration, which requires the BGL, is verified first. With this change it will be possible to expose whole-cluster verification through the remote API without requiring additional client logic on top of standard features like LU-generated jobs and job dependencies. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 08 Aug, 2011 3 commits
-
-
Iustin Pop authored
Currently, the IAllocator code requests strictly that the (set of) groups of the nodes we're relocating from is equal to the set of groups we're relocating to. This, however, makes is impossible to fix split instances, since (by definition) the secondary of a split instance is not in the same group as the primary node, and after the fixing is it the same. The patch changes the test from group equality to check that the final group set (across both primary and secondary nodes) is a subset of the initial group set (again across both nodes). This means we can't "extend" the group of nodes but keeping the same or decreasing it is allowed. After this patch, one can finally fix (automatically) split instances via a gnt-instance replace-disks. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit f0edfcf6 removed the parsing of multi-evacuate result, but the code went from: if mode in (multi-evac, relocate): … if mode == relocate: … to: if mode == relocate: … if mode == relocate … This patch simply removes the nested if. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Commit 342f9172 added stricter checks for the iallocator result in evacuate mode, but it does this irrespective of the result status. When the result has failed and (according to the design) the list of nodes is empty, this code will trigger the following: node1# gnt-instance replace-disks -I hail instance14 Failure: command execution error: Groups of nodes returned by iallocator () differ from original groups (default) After the patch, the result is: node1# gnt-instance replace-disks -I hail instance14 Failure: prerequisites not met for this operation: error type: insufficient_resources, error details: Can't compute nodes using iallocator 'hail': Request failed: … Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 05 Aug, 2011 4 commits
-
-
Michael Hanselmann authored
It is no longer used and has been deprecated in 2.5. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
… instead of getting the list of instances once again from the configuration. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Aug, 2011 5 commits
-
-
Apollon Oikonomopoulos authored
Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate already calls _WaitForSync with oneshot=True, which already performs an internal wait-loop for disks to start syncing. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
lu.glm.list_owned becomes lu.owned_locks, which is clearer for the reader. Also rename three variables (which were before named owned_locks) to make clearer what they track. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This is quite similar to evacuating a group, but the locking is different. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
All potential target nodes should be locked while calculating a group evacuation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-