- May 11, 2012
-
-
Iustin Pop authored
The vgreduce command has changed behaviour from when we initially wrote the code (2.02.02 versus 2.02.66, 4 years delta): - if there are LVs which will be impacted, it requires --force - otherwise refuses to proceed, but it still returns exit code 0 We handle this by looking to see if it returns "Wrote out consistent volume group" (behaviour unchanged), or if it complains about "--force"; in the case it didn't complete, we retry the operation. We improve a bit the checking of "vgs", as it uses to fail silently and we didn't detect it. New tests for this function should test, I believe, all the expected variations; at the least we now have data files with the expected output. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 048eeb2b) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 09, 2012
-
-
Iustin Pop authored
Commit e687ec01 (present in 2.5 since the 2.5 beta 3) did consistency fixes across the code-base. Unfortunately this was done without enough checks on the actual meaning of one of the fixes, which means error re-raising in lib/errors.py is broken. The problem is that: raise cls, args is different than: raise cls(args) And our unit-tests didn't catch this (this patch updates the tests). This breakage is usually trivial, like wrong error messages: $ gnt-instance remove no-such-instance Failure: prerequisites not met for this operation: ("Instance 'no-such-instance' not known", 'unknown_entity') versus: $ gnt-instance remove no-such-instance Failure: prerequisites not met for this operation: error type: unknown_entity, error details: Instance 'no-such-instance' not known or: $ gnt-instance add … no-such-instance Failure: prerequisites not met for this operation: ('The given name (no-such-instance) does not resolve: Name or service not known', 'resolver_error') versus: $ gnt-instance add … no-such-instance Failure: prerequisites not met for this operation: error type: resolver_error, error details: The given name (no-such-instance) does not resolve: Name or service not known But in some cases where we rely on a certain data representation (e.g. HooksAbort), this actually breaks because we try to iterate over the wrong type: File "/usr/lib/python2.6/dist-packages/ganeti/cli.py", line 1907, in FormatError for node, script, out in err.args[0]: ValueError: need more than 1 value to unpack Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 06, 2012
-
-
Guido Trotter authored
This of course was working for all the rcs, but broke with 1.0 itself. In addition: - split between running kvm --version and parsing its output - unittest parsing for various known --help outputs - updated NEWS file - happy 2012 wishes - the hope to finish this patch before it's time to say happy easter :) Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 21, 2011
-
-
Michael Hanselmann authored
When an opcode is about to be processed its dependencies are evaluated using “_JobDependencyManager.CheckAndRegister”. Due to its nature that function requires a lock on the manager's internal structures. All of this happens while the job queue lock is held in shared mode (required for the job processor). When a job has been processed any pending dependencies are re-added to the job workerpool. Before this patch that would require the manager's lock and then, for adding the jobs, the job queue lock. Since this is in reverse order it will lead to deadlocks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 30, 2011
-
-
Iustin Pop authored
This would have caught the bug in the first place. Argh, hand-generated test cases! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 24, 2011
-
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 218f4c3d)
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 08, 2011
-
-
Michael Hanselmann authored
If an instance can't be evacuated, only a message would be printed. With this change the operation always aborts. Newly added unittests check for this behaviour. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 18, 2011
-
-
Michael Hanselmann authored
Commit e1f23243 changed te LU and opcode for node evacuation to receive a “mode” parameter (among other things). Commit de40437a changed the RAPI code accordingly, but did so for an earlier version of the first patch. Obviously this couldn't work, so here's the fix. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 17, 2011
-
-
Michael Hanselmann authored
Commit d1c172de inadvertently changes the “/2/instances/[instance_name]/replace-disks” resource to use body parameters. There were no QA tests and the issue wasn't noticed. This patch re-introduces support for query parameters and adds a QA test. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
- Oct 04, 2011
-
-
Michael Hanselmann authored
If a cluster has any non-master-candidate nodes, those don't contain all files (e.g. config.data). With commit aef59ae7 (March 31st, 2011) the logic was changed and subsequently verifying a cluster with non-mc nodes would complain. This patch fixes this issue by changing the algorithm. It also adds an additional check for files which shouldn't exist on a machine. A newly added unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Sep 30, 2011
-
-
Michael Hanselmann authored
When verifying a group the code would always check SSH to all nodes in the same group, as well as the first node for every other group. On big clusters this can cause issues since many nodes will try to connect to the first node of another group at the same time. This patch changes the algorithm to choose a different node every time. A unittest for the selection algorithm is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 26, 2011
-
-
Michael Hanselmann authored
If a value passed to UnescapeAndSplit ended with a backslash an exception would be raised: $ gnt-instance modify -H mem=x\\ inst1.example.com […] e2 = slist.pop(0) IndexError: pop from empty list Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 23, 2011
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
In the Linux kernel commit 4b0715f096 introduced a display bug into /proc/drbd which broke our regex. The bug was first introduced into Linux 2.6.39-rc1. This bug is still unfixed as of today. This patch adapt the regular expression to workaround this bug for the time being. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 19, 2011
-
-
Michael Hanselmann authored
This avoids many calls to chmod(2) and chown(2), and thereby ctime updates. Since I had to update the unittests anyway I untangled the code a bit, split it into more separate functions and added some more tests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
The “_stat_fn” function is renamed to “_lstat_fn” to reflect its function. The try/except block just wraps calling lstat(2) and nothing else. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 12, 2011
-
-
Michael Hanselmann authored
Instead of a rather complicated expression only “JobId” is output. Job ID lists (like generated by “SubmitManyJobs”) are limited to two-item lists. Unittests are added. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 11, 2011
-
-
Andrea Spadaccini authored
Added the following functions to netutils: - IsValidInterface - GetInterfaceIpAddresses - _GetIpAddressesFromIpOutput Added the following static methods to netutils.IPAddress: - GetAddressFamilyFromVersion - GetVersionFromAddressFamily Added unit tests for the new methods in netutils.IPAddress, for the IP address search regex and for GetInterfaceIpAddresses Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 09, 2011
-
-
Michael Hanselmann authored
Commit 1ffd2673 added support for tagging node groups. Also add a check for exposed fields. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 08, 2011
-
-
Michael Hanselmann authored
Short: this patch enables the use of “gnt-instance list '*.site'”. Detailed description: This patch changes the command line interface code to try to deduce the kind of filter from the arguments to a “list” command. If it's a list of plain names an old-style name filter is used. If filtering is forced or the single argument is potentially a filter, it is parsed as a query filter string. Any name looking like a globbing pattern (e.g. “*.site” or “web?.example.com”) is treated as such. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 05, 2011
-
-
Michael Hanselmann authored
The operators “=*” and “!*” do globbing in filters, e.g.: $ gnt-instance list --no-headers -o name 'name =* "*.site"' inst1.site.example.com Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This will be used by the watcher to store the file's fstat(2). It must be done from the filehandle. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 04, 2011
-
-
Iustin Pop authored
We just need an object that has a list_owned method. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Watcher state files can stay around if node groups are removed. With this patch they're removed after 21 days. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 02, 2011
-
-
Michael Hanselmann authored
These will be very useful for ganeti-watcher as it needs to retrieve instances by group. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 26, 2011
-
-
Michael Hanselmann authored
This adds the infrastructure necessary to check opcode results using ht-based functions. Checks are added for two opcodes. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 25, 2011
-
-
Michael Hanselmann authored
Commit b6fa9a44 added a re-openable log handler. The log file is reopened when a daemon is sent a HUP signal. Due to a bug in the code, fixed by this patch, the log file would be reopened for every single log message thereafter. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 21, 2011
-
-
Michael Hanselmann authored
No idea why this was missed before. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This makes them visible to the user. Example: $ gnt-debug locks -o name,pending Name Pending job/890 job:891,892 job/892 job:894 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this change it'll be possible to register other lock information providers. One usecase for this are job dependencies, which can be shown in the output of “gnt-debug locks”, too. The lock monitor is changed to accept more than one return value from the function providing the information. Unfortunately it's hard to keep weak references to bound methods, so that I settled on keeping a weak reference on the object instead (see note in docstring). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When jobs waiting for a dependency are notified, they're re-added to the queue. This would require owning the queue lock in exclusive mode, but since the function doing so is called from within the job/opcode processor, it only holds the lock in shared mode. This patch changes the result of the processor from a boolean to a status value (integer). This way the caller can be notified about actions to take, including notifying waiting jobs. The function adding jobs to the queue can now acquire the lock in exclusive mode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this change users of the “SubmitManyJobs” interface can use relative job dependencies. Relative job IDs in dependencies are resolved before handing the job off to the workerpool. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 20, 2011
-
-
Michael Hanselmann authored
Basically only one instance of the job, the one being processed, should be serialized to disk and replicated to other nodes. With this flag assertions can be added in various places. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
An overview is available in the design document for this change, doc/design-chained-jobs.rst. When a job enters the job processor, the current opcode's dependencies are evaluated. If a referenced job has not yet reached the desired status, the current job is registered as a dependant. The job processor will continue to work on other pending tasks. When a job finishes it notifies any pending dependants by re-adding them to the workerpool. A per-job processor lock is necessary for rare cases where the same job can be re-added twice. There is no way to view waiting jobs at the moment, but I plan to export this information to “gnt-debug locks”. A so-called dependency manager takes care of managing waiting jobs and keeping track of their status. Unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 15, 2011
-
-
Stephen Shirley authored
The wrapper will connect to the console, and check in the background if the instance is paused, unpausing it as necessary. Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 12, 2011
-
-
Michael Hanselmann authored
This patc changes cli.GetOnlineNodes to use query2, which does the filtering in the master daemon, and adds a new parameter to filter by node group. Unittests were added for the old implementation and then adopted to ensure no functionality was lost. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-