- Jan 19, 2012
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This would have caught the bug in the first place. Argh, hand-generated test cases! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Nikos Skalkotos authored
Fix bug affecting command line options of "keyval" type. Although escaping commands with \ is supported, it is is not applied to the input recursively. Signed-off-by:
Nikos Skalkotos <skalkoto@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 18, 2012
-
-
Michael Hanselmann authored
Python's “optparse” module does option name prefix matching by default. Since this can lead to confusing behaviour, e.g. by specifying “--force” for a command which only has a “--force-multi” option, this patch disables the prefix matching. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jan 06, 2012
-
-
Guido Trotter authored
* stable-2.5: KVM: support version reported by 1.0 Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
Guido Trotter authored
This of course was working for all the rcs, but broke with 1.0 itself. In addition: - split between running kvm --version and parsing its output - unittest parsing for various known --help outputs - updated NEWS file - happy 2012 wishes - the hope to finish this patch before it's time to say happy easter :) Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 22, 2011
-
-
Michael Hanselmann authored
Also mention that archived jobs can be viewed using “gnt-job info”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* stable-2.5: jqueue: Fix epylint errors introduced in 37d76f1e jqueue: Fix deadlock between job queue and dependency manager Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This allows for more unittesting. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also add a parameter for priority, to be used in an upcoming patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When an opcode is about to be processed its dependencies are evaluated using “_JobDependencyManager.CheckAndRegister”. Due to its nature that function requires a lock on the manager's internal structures. All of this happens while the job queue lock is held in shared mode (required for the job processor). When a job has been processed any pending dependencies are re-added to the job workerpool. Before this patch that would require the manager's lock and then, for adding the jobs, the job queue lock. Since this is in reverse order it will lead to deadlocks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 19, 2011
-
-
Michael Hanselmann authored
These help when debugging. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
- Dec 08, 2011
-
-
Michael Hanselmann authored
Instead, print a nicer error message. This should fix issue 200. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If the PID file is already locked by another process, try to read the content and report it as part of the error message. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If a “lost+found” directory is found at a filesystem's root path it is ignored. In all other cases directory entries named “lost+found” are treated normally. Unittests are included. Fixes issue 153. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 30, 2011
-
-
Michael Hanselmann authored
In this test the “file ID” of a temporary file is compared against the file ID gathered via an open file descriptor to the same file. For reasons unknown to me utime(2) is called in-between to update the inode's a- and mtime. Depending on the file system's timestamp resolution this can lead to a different file ID. Found by chance during QA and reproduced by adding a delay before the call to utime(2). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 24, 2011
-
-
Michael Hanselmann authored
* devel-2.4: ConfigWriter: Fix epydoc error Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
* stable-2.5: ConfigWriter: Fix epydoc error Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
Michael Hanselmann authored
The parameter is called “mods”, not “modes”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com> (cherry picked from commit 1730d4a1)
-
Michael Hanselmann authored
The parameter is called “mods”, not “modes”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
Michael Hanselmann authored
* devel-2.4: LUGroupAssignNodes: Fix node membership corruption Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
* stable-2.5: LUGroupAssignNodes: Fix node membership corruption Fix pylint warning on unreachable code LUNodeEvacuate: Disallow migrating all instances at once LUNodeEvacuate: Locking fixes Fix error when removing node Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 218f4c3d)
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Commit c50452c3 added an exception when all instances should be evacuated off a node, but did so in a way which made pylint complain about unreachable code. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 23, 2011
-
-
Michael Hanselmann authored
There is a design issue in the iallocator interface which prevents us from doing this. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Until now the iallocator constants for node evacuation (IALLOCATOR_NEVAC_*) were also used for the opcode. However, it turned out this was due to a misunderstanding and is incorrect. This patch adds new constants (with the same values) and changes the affected places. Fortunately the RAPI client already used good names, so no changes are necessary. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
When evacuating a node, only an assertion without informative text was used to check if the necessary node locks had been acquired. This was on top of evaluating the list of nodes without having a node group lock, so this was changed as well. Also update some exception messages to include “retry the operation”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list. Removing a node would fail with “too many values to unpack”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 18, 2011
-
-
Michael Hanselmann authored
* stable-2.5: htools: rework message display construction hbal: handle empty node groups Document OpNodeMigrate's result for RAPI Fail if node/group evacuation can't evacuate instances LUInstanceRename: Compare name with name LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode Update NEWS for 2.5.0~rc4 Bump version to 2.5.0~rc4 jqueue: Allow zero jobs to be submitted at once hail: don't select the primary as new secondary hail: add an extra safety check in relocate Bump version to 2.5.0~rc3 Conflicts: configure.ac: Trivial Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
* devel-2.4: Ensure unused ports return to the free port pool Re-wrap a paragraph to eliminate a sphinx warning Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 17, 2011
-
-
Michael Hanselmann authored
After iallocator ran we can release any unused node locks. Since they must be in exclusive mode this should improve parallelization during instance creation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 16, 2011
-
-
Iustin Pop authored
While diagnosing some (unrelated) memory usage in htools, I've stumbled upon some very bad behaviour in checkData: mapAccum is non-strict, and the tuple we use also, so that results in the list of list of messages being very bad space-wise (hundreds of MB of memory for a simulated cluster with thousands of nodes, all with errors). The new, explicit reuse of the old message list has a linear memory behaviour. The only downside is that messages are listed in the reverse order (which I'll fix on master). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch changes an internal assert (which can only be triggered when a node group is empty) into properly handling this case (and returning empty node/instance lists). While we could handle this in the backend (Cluster.splitNodeGroup) this would actually mean than we change the behaviour for a cluster with just two node groups, once of which is empty (where today we don't require a node group argument). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 15, 2011
-
-
Michael Hanselmann authored
- Commit b7a1c816 changed the LU to generate jobs - Mention documented results in NEWS Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 14, 2011
-
-
Vangelis Koukis authored
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() are returned to the pool of free ports when no longer needed: * Return the network_port of an instance when it is removed * Return the port used by a DRBD-based disk when it is removed Signed-off-by:
Vangelis Koukis <vkoukis@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This just makes sure that the paragraph doesn't contains lines that start with :, which make Sphinx (1.0.7) complain. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 10, 2011
-
-
Guido Trotter authored
These are triggered by our "stay-compatible" approach. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-