- Jul 25, 2011
-
-
Michael Hanselmann authored
Commit b6fa9a44 added a re-openable log handler. The log file is reopened when a daemon is sent a HUP signal. Due to a bug in the code, fixed by this patch, the log file would be reopened for every single log message thereafter. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 21, 2011
-
-
Michael Hanselmann authored
No idea why this was missed before. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This makes them visible to the user. Example: $ gnt-debug locks -o name,pending Name Pending job/890 job:891,892 job/892 job:894 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this change it'll be possible to register other lock information providers. One usecase for this are job dependencies, which can be shown in the output of “gnt-debug locks”, too. The lock monitor is changed to accept more than one return value from the function providing the information. Unfortunately it's hard to keep weak references to bound methods, so that I settled on keeping a weak reference on the object instead (see note in docstring). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When jobs waiting for a dependency are notified, they're re-added to the queue. This would require owning the queue lock in exclusive mode, but since the function doing so is called from within the job/opcode processor, it only holds the lock in shared mode. This patch changes the result of the processor from a boolean to a status value (integer). This way the caller can be notified about actions to take, including notifying waiting jobs. The function adding jobs to the queue can now acquire the lock in exclusive mode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this change users of the “SubmitManyJobs” interface can use relative job dependencies. Relative job IDs in dependencies are resolved before handing the job off to the workerpool. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 20, 2011
-
-
Michael Hanselmann authored
Basically only one instance of the job, the one being processed, should be serialized to disk and replicated to other nodes. With this flag assertions can be added in various places. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
An overview is available in the design document for this change, doc/design-chained-jobs.rst. When a job enters the job processor, the current opcode's dependencies are evaluated. If a referenced job has not yet reached the desired status, the current job is registered as a dependant. The job processor will continue to work on other pending tasks. When a job finishes it notifies any pending dependants by re-adding them to the workerpool. A per-job processor lock is necessary for rare cases where the same job can be re-added twice. There is no way to view waiting jobs at the moment, but I plan to export this information to “gnt-debug locks”. A so-called dependency manager takes care of managing waiting jobs and keeping track of their status. Unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 15, 2011
-
-
Stephen Shirley authored
The wrapper will connect to the console, and check in the background if the instance is paused, unpausing it as necessary. Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 12, 2011
-
-
Michael Hanselmann authored
This patc changes cli.GetOnlineNodes to use query2, which does the filtering in the master daemon, and adds a new parameter to filter by node group. Unittests were added for the old implementation and then adopted to ensure no functionality was lost. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 11, 2011
-
-
Michael Hanselmann authored
Places which receive floats can usually also deal with integers, e.g. OpTestDelay. Tests are added and the new check function is used for the aforementioned opcode and verifying query results. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 05, 2011
-
-
Michael Hanselmann authored
The change is not backwards compatible, see the updated NEWS file. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jun 10, 2011
-
-
René Nussbaumer authored
This includes an own simple cache implementation and an interface to a memcache instance. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 09, 2011
-
-
Guido Trotter authored
Keys generated under debian sid just read "BEGIN PRIVATE KEY" rather than "BEGIN RSA PRIVATE KEY". Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 03, 2011
-
-
Iustin Pop authored
Commit 66bd7445 changed the semantics of _JobProcessor on finished jobs, and updated the related unittests in the 2.4 branch. It was then merged to master, however on master there was an additional test for this case, which was not updated. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 31, 2011
-
-
Michael Hanselmann authored
When a job was cancelled, its status would be changed and the file written again. Since this was a final status, the job file could be moved anytime for archival. If the job was still in the queue, however, it would be processed (not fully, just updating the “end_timestamp” attribute) and written again. This was bad as it could leave the same job in two different files. With this patch the processor is changed to return early for finished jobs. Cancelling a queued job will finalize it right away. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 30, 2011
-
-
Michael Hanselmann authored
Until now LUNodeMigrate used multiple tasklets to evacuate all primary instances on a node. In some cases it would acquire all node locks, which isn't good on big clusters. With upcoming improvements to the LUs for instance failover and migration, switching to separate jobs looks like a better option. This patch changes LUNodeMigrate to use LU-generated jobs. While working on this patch, I identified a race condition in LUNodeMigrate.ExpandNames. A node's instances were retrieved without a lock and no verification was done. For RAPI, a new feature string is added and can be used to detect clusters which support more parameters for node migration. The client is updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com>
-
- May 27, 2011
-
-
Michael Hanselmann authored
The check for container items is useful for tuples and/or lists with non-uniform values. The “anything” check can be used when any value should be accepted for an item. The job ID check, which uses the regexp check, will be used for expressing opcode dependencies on other jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 24, 2011
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 20, 2011
-
-
Michael Hanselmann authored
Also check for the opcode ID. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
With this change, LUClusterVerifyConfig becomes a "light" LU that only verifies the global config and other, master-only settings, and the bulk of node/instance verification is done by LUClusterVerifyGroup, which only acts on nodes and instances of a given group. To ensure that `gnt-cluster verify` continues to operate on the whole cluster, the client creates an OpClusterVerifyGroup job per node group; for convenience, the list of node groups is returned by LUClusterVerifyConfig. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 17, 2011
-
-
Michael Hanselmann authored
This allows checking specific dictionary items, unlike TDict or TDictOf. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 13, 2011
-
-
Michael Hanselmann authored
If a job needs to modify a resource and then wait for a result, it must acquire the resource lock in exclusive mode. In some cases it would be possible to only have a shared lock for waiting. Until now it was not possible to change a lock's mode once it'd been acquired. Releasing and re-acquiring might have been possible, but would require many more checks and can introduce new issues. With this patch a new method, named “downgrade”, is added to Ganeti's own SharedLock class. It can only be called when the lock is held in exclusive mode and changes it to shared. If there are any pending shared acquires on the same priority, they're moved to the front of the queue and notified (jumping ahead of exclusive acquires). In a lockset the internal lock will be downgraded if, and only if, all individual locks owned by the current thread are either released or acquired in shared mode. Unittests are provided. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 10, 2011
-
-
Michael Hanselmann authored
No idea where those four spaces came from, but they must've been there for a while. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The “acquired_locks” attribute in LUs is used to keep a list of acquired locks at each lock level. This information is already known in the lock manager, which also happens to be the authoritative source. Removing the attribute and directly talking to the lock manager saves us from having to maintain the duplicate information when releasing locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 09, 2011
-
-
Michael Hanselmann authored
Depending on the opcode and its parameters, the existing “Summary” function can give a rater long summary. For displaying the summary in logs and in the lock monitor, it should be shorter. Hence this new function is added to just use the opcode ID with common prefixes replaced (e.g. “INSTANCE_” becomes “I_”). Opcode values are not used. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
It is always used in the locking code. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Apr 28, 2011
-
-
Iustin Pop authored
Currently RemoveEtcHostsEntry keeps the ordering, but SetEtcHostsEntry not, as it will always write the new entry at the end of file. I personally dislike this as it "uglifies" my custom host files, so this patch makes it update the record instead in-place so to say instead of moving it. The patch also simplifies the construction of the new line (we were doing duplicate work for no gain). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Unicode is fun, indeed: >>> len(buffer("abc")) 3 >>> len(buffer(u"abc")) 12 So we can't pass unicode data to buffer(), as the result will be to write the in-memory (usually UTF-32) representation to disk. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 18, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this parser, command line utilities will be able to provide filters through query2 in a simplistic language. Example filters: name == "node3.example.com" master or (name == "node4.example.com") be/memory == 128 and name =~ /^web/i "inst1.example.com" in sinst_list status != "up" not master Parts of the syntax came from Python, others from Perl. Documentation will be added in follow-up patches. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
These were not available as a query field before. Update unittests and description text for the other “..params” fields. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 13, 2011
-
-
Michael Hanselmann authored
Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154 ) reported an “Operation not supported” error when writing instance exports to a mounted CIFS filesystem. Experimentation showed the error to only occur when using rename(2) on an opened file. Various references on the web confirmed this observation. Whether or not the problem occurs can also depend on the CIFS server implementation. In issue 154 it was Windows 2008 R2. While not solving all cases, closing the file before renaming helps alleviating the issue a bit. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 06, 2011
-
-
Iustin Pop authored
This has been observed to cause problems on real clusters via the following mechanism: - a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance - the watcher starts and submits its query instances opcode which wants shared locks for all instances - after about an hour, the watcher job falls back to blocking acquire, after having acquired all other locks - any instance opcode that wants an exclusive lock for an instance cannot start until the watcher has finished, even though there's no actual operation on that instance In order to alleviate this problem, we simply increase the max timeout until lock acquires are sent back to either blocking acquire or priority increase. The timeout is computed such that we wait ~10 hours (instead of one) for this to happen, which should be within the maximum lifetime of a reasonable opcode on a healthy cluster. The timeout also means that priority increases will happen every half hour. We also increase the max wait interval to 15 seconds, otherwise we'd have too many retries with the increased interval. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
The intent of this function is to be able to provide a globbing operator or query filters. One should be able to say, for example, something to the effect of “gnt-instance shutdown '*.site'”. Also rename a variable in MatchNameComponent. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 05, 2011
-
-
Michael Hanselmann authored
So far this operator was not implemented. This patch adds an additional value preparation function to the function table for binary operators, used to compile the regular expression. Unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 31, 2011
-
-
Iustin Pop authored
The new wrapper makes moving legacy code to utils.Retry or adding retries in existing code simpler. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-