- 23 Jun, 2010 3 commits
-
-
Guido Trotter authored
We move from querying the in-memory version to loading all jobs from the disk. Since the jobs are written/deleted on disk in an atomic manner, we don't need to lock at all. Also, since we're just looking at the contents of a directory, we don't need to check that the job queue is "open". If some jobs are removed between when we listed them and us loading them, we need to be able to cope: if we were asked to load those jobs specifically, we must report the failure, but if we were just asked to "load all" we shall just not consider them as part of the "all" set, since they were deleted. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This will be used to read a job file without having to deal with exceptions from _LoadJobFromDisk. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently _LoadJobFromDisk archives job files it finds corrupted. Since we want to use it to load files without holding locks, this could cause a conflict: we just move the feature to _LoadJobUnlocked which is always called with the lock held. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 17 Jun, 2010 4 commits
-
-
Guido Trotter authored
Rather than adding the jobs to the worker pool one at a time, we add them all together, which is slightly faster, and ensures they don't get started while we loop. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Sometimes it's useful to write to the local filesystem, but immediate replication to all master candidates is not needed. The _WriteAndReplicateFileUnlocked function gets renamed to _UpdateJobQueueFile, as calling "write and replicate, but don't replicate" seemed a bit strange. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
The job queue currently has a static _GetJobInfoUnlocked method. Changing it to be a normal method of _QueuedJob, which makes more sense. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Move the work from _LoadJobUnlocked to _LoadJobFileFromDisk, which can then be used in other contexts as well. Also, if we fail to deserialize the job, archive it as well (before we archived it only if we failed to create the related object, but kept it there if deserialization failed. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 15 Jun, 2010 2 commits
-
-
Guido Trotter authored
Among all users, turns out just one *may* need the output to be sorted. All the others can cope without. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Somewhere we do try/del/except and somewhere just pop. Using pop everywhere saves lines of code. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 11 Jun, 2010 6 commits
-
-
Guido Trotter authored
Currently each time we submit a job we check the job queue size, and the drained file. With this change we keep these pieces of information in memory and don't read them from the filesystem each time. Significant changes include: - The drained value can only be properly set by calling the appropriate cluster command "gnt-cluster queue drain/undrain" and not by removing/creating the file in the job queue directory. Not that anybody would have done it in this undocumented way before. - We get rid of the soft limit for the job queue, which we haven't ever used anyway. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
The name clarifies the difference between this and the internal lock. Also explain a bit better what it is. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Currently we sort the list of job queue files twice (once in utils.ListVisibleFiles with sort and then later with NiceSort). We apply the _RE_JOB_FILE regular expression twice (once in _ListJobFiles and once in _ExtractJobID). This simplifies the code a little, and a couple of functions performing basically the same job are collapsed. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
This also removes the relevant pylint disable. No point in keeping unused parameters around: if/when we need them it's easy to add it back. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Rather than raising Exception use GenericError and explain a bit better what happened. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
This call was introduced but never used. In two years. Since it's just creating/removing a file it can also be in simpler ways, without a special rpc call, if/when we need it again. In the meantime, let's give it to history. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 01 Jun, 2010 1 commit
-
-
Iustin Pop authored
Since the current start_timestamp opcode attribute refers to the inital start time, before locks are acquired, it's not useful to determine the actual execution order of two opcodes/jobs competing for the same lock. This patch adds a new field, exec_timestamp, that is updated when the opcode moves from OP_STATUS_WAITLOCK to OP_STATUS_RUNNING, thus allowing a clear view of the execution history. The new field is visible in the job output via the 'opexec' field. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 08 Mar, 2010 2 commits
-
-
Iustin Pop authored
This should remove most of the remaining constructs which can be replaced by PathJoin. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This passes a full burnin with lots of instances, and should be safe as we mostly to join a known root (various constants) to a run-time variable. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 13 Jan, 2010 4 commits
-
-
Michael Hanselmann authored
When the queue was empty, the calculation for unchecked jobs while archiving would return -1. ``last_touched`` is set to 0, the job ID list (``all_job_ids``) is empty. Calculating ``len(all_job_ids) - last_touched - 1`` resulted in -1. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Before it would log something like “starting task (<ganeti.http.client._HttpClientPendingRequest object at 0x2aaaad176790>,)”, which isn't really useful for debugging. Now it'll log “[…] <ganeti.http.client._HttpClientPendingRequest req=<ganeti.http.client.HttpClientRequest 172.24.x.y:1811 PUT /node_info at 0x2aaaaab7ed10> at 0x2aaaaab823d0>”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Having a proper name instead of just a number makes debugging easier. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Jan, 2010 4 commits
-
-
Iustin Pop authored
Many of our functions have to follow a given API, and thus we have to keep a given signature, but pylint doesn't understand this. Therefore, we silence this warning. The patch does a few other cleanups. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Currently only the rpc call, but not its description (which also shows the argument) is logged. We change this to log failmsg too, and this also silences a warning. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Many methods are simple pure functions, and not depending on the object state. We convert these to staticmethods. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This patch should have only: - pylint disables - docstring changes - whitespace changes Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- 28 Dec, 2009 1 commit
-
-
Iustin Pop authored
This cherry-picks the utils.FieldSet.Matches changes and the significant jqueue.py change. These are stable in the 2.1 branch and therefore make sense to backport to 2.0 (are basically cleanups). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- 25 Nov, 2009 1 commit
-
-
Iustin Pop authored
This patch removes the quotes from CommaJoin and converts most of the callers (that I could find) to it. Since CommaJoin does str(i) for i in param, we can remove these, thus simplifying slightly a few calls. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 06 Nov, 2009 2 commits
-
-
Guido Trotter authored
When the processor is executing a job, it can export the execution id to its callers. This is not supported for Queries, as they're not executed in a job. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This patch adds some silences and tweaks the code slightly so that “pylint --rcfile pylintrc -e ganeti” doesn't give any errors. The biggest change is in jqueue.py, the move of _RequireOpenQueue out of the JobQueue class. Since that is actually a function and not a method (never used as such) this makes sense, and also silences two pylint errors. Another real code change is in utils.py, where FieldSet.Matches will return None instead of False for failure; this still works with the way this class/method is used, and makes more sense (it resembles more closely the re.match return values). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 03 Nov, 2009 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 12 Oct, 2009 1 commit
-
-
Michael Hanselmann authored
Found using pylint and epydoc. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 25 Sep, 2009 1 commit
-
-
Iustin Pop authored
Currently, the actual exception raised during an LU execution (one of OpPrereqError, OpExecError, HooksError, etc.) is lost because the jqueue.py code simply sets that to a str(err), and the code in cli.py simply passes that string to OpExecError. This patch moves to encoding the errors as per errors.EncodeError and changes the cli code to parse and raise that (if possible). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit bcb66fca)
-
- 17 Sep, 2009 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 15 Sep, 2009 4 commits
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This can be useful for debugging locking problems. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
There are two major arguments for this: - There will be more callbacks (e.g. for lock debugging) and extending the parameter list is a lot of work. - In the jqueue module this allows us to keep per-job or per-opcode variables in a separate class. Instead of having to clean up the worker class after processing one job, these references will automatically go out of scope. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 07 Sep, 2009 1 commit
-
-
Iustin Pop authored
Currently, on multi-job submits we simply iterate over the single-job-submit function. This means we grab a new serial, write and replicate (and wait for the remote nodes to ack) the serial file, and only then create the job file; this is repeated N times, once for each job. Since job identifiers are ‘cheap’, it's simpler to simply grab at the start a block of new IDs, write and replicate the serial count file a single time, and then proceed with the jobs as before. This is a cheap change that reduces I/O and reduces slightly the CPU consumption of the master daemon: submit time seems to be cut in half for big batches of jobs and the masterd cpu time by (I can't get consistent numbers) between 15%-50%. Note that this doesn't change anything for single-job submits and most probably for < 5 job submits either. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 03 Sep, 2009 1 commit
-
-
Michael Hanselmann authored
This survived QA, burnin and unittests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Luca Bigliardi <shammash@google.com>
-