- Mar 30, 2012
-
-
Michael Hanselmann authored
This enables the use of filters through query2 when listing jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
… instead of re-calculating it on every file change. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This rather inefficient implementation (fields are evaluated on every call to GetInfo) is not good for WaitForJobChanges and doesn't support filters, but that will be rectified in later patches. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
There was a typo and it's not necessary to repeat the class name. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 22, 2011
-
-
Michael Hanselmann authored
This allows for more unittesting. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Serializing to JSON using “simplejson” is significantly slower when indentation and/or sorting of dictionary keys is used. In simplejson 1.x the difference isn't that big, but with simplejson 2.x the difference can be up to a factor of 7.5. The reason is that the latter no longer uses C functions when sorting or indentation is used. With this patch we revert everything to simplejson's defaults, which should provide us with the best performance available. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When an opcode is about to be processed its dependencies are evaluated using “_JobDependencyManager.CheckAndRegister”. Due to its nature that function requires a lock on the manager's internal structures. All of this happens while the job queue lock is held in shared mode (required for the job processor). When a job has been processed any pending dependencies are re-added to the job workerpool. Before this patch that would require the manager's lock and then, for adding the jobs, the job queue lock. Since this is in reverse order it will lead to deadlocks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 21, 2011
-
-
Michael Hanselmann authored
Doing so will prevent job submissions (similar to a drained queue), but won't affect currently running jobs. No further jobs will be executed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 17, 2011
-
-
Michael Hanselmann authored
This is in preparation for a clean(er) shutdown of masterd. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 27, 2011
-
-
Michael Hanselmann authored
If cmdlib.LUNodeMigrate was called for a node without primary instances it would try to submit an empty list of jobs. This was never visible via CLI as there we check the list of primary instances first. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Oct 26, 2011
-
-
Michael Hanselmann authored
With these changes job queue RPC will finally show up on the lock monitor. See below for an example. A job queue-specific class is used to restrict the use of a static list for name resolution to the job queue. Further improvements can be made to not re-create the whole RPC client for every call (e.g. by using a more dynamic resolver), but for now this works. rpc/node8.example.com/jobqueue_update Jq8/Job9/TEST_DELAY Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Sep 06, 2011
-
-
Michael Hanselmann authored
Commit 66bd7445 added an assertion to ensure a finalized job has its “end_timestamp” attribute set. Unfortunately it didn't cover a case when the queue is recovering from an unclean master shutdown. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> (cherry picked from commit 45df0793)
-
- Aug 30, 2011
-
-
Andrea Spadaccini authored
Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how I fixed them: * jqueue.py: silenced E1101 * netutils.py: rewrote the list comprehension using extend() * watcher/__init__.py: fixed a missing format string parameter These changes are backwards-compatible with pylint 0.21.1. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
In version 0.21, pylint unified all the disable-* (and enable-*) directives to disable (resp. enable). This leads to a lot of DeprecationWarning being emitted even if one uses the recommended version of pylint (0.21.1, as stated in devnotes.rst). This commit changes all the disable-msg directives to disable. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 19, 2011
-
-
Michael Hanselmann authored
This was a regression from 2.4. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 02, 2011
-
-
Michael Hanselmann authored
By sleeping for 100ms after receiving a notification for a changed job file the job is given some additional time to change again. This significantly reduces the number of LUXI calls for WaitForJobChanges (depending on the job, in my tests with “gnt-cluster verify --debug-simulate-errors” by about 80%), and improves performance (the same job went from around 7 seconds to around 3.5 seconds). This method is not perfect. The algorithm could be made more complex, e.g. by increasing the delay on each change, etc., but for now this simple change provides a good improvement. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 21, 2011
-
-
Michael Hanselmann authored
This makes them visible to the user. Example: $ gnt-debug locks -o name,pending Name Pending job/890 job:891,892 job/892 job:894 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When jobs waiting for a dependency are notified, they're re-added to the queue. This would require owning the queue lock in exclusive mode, but since the function doing so is called from within the job/opcode processor, it only holds the lock in shared mode. This patch changes the result of the processor from a boolean to a status value (integer). This way the caller can be notified about actions to take, including notifying waiting jobs. The function adding jobs to the queue can now acquire the lock in exclusive mode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this change users of the “SubmitManyJobs” interface can use relative job dependencies. Relative job IDs in dependencies are resolved before handing the job off to the workerpool. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 20, 2011
-
-
Michael Hanselmann authored
Basically only one instance of the job, the one being processed, should be serialized to disk and replicated to other nodes. With this flag assertions can be added in various places. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
An overview is available in the design document for this change, doc/design-chained-jobs.rst. When a job enters the job processor, the current opcode's dependencies are evaluated. If a referenced job has not yet reached the desired status, the current job is registered as a dependant. The job processor will continue to work on other pending tasks. When a job finishes it notifies any pending dependants by re-adding them to the workerpool. A per-job processor lock is necessary for rare cases where the same job can be re-added twice. There is no way to view waiting jobs at the moment, but I plan to export this information to “gnt-debug locks”. A so-called dependency manager takes care of managing waiting jobs and keeping track of their status. Unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 15, 2011
-
-
Michael Hanselmann authored
Commit 66bd7445 added an assertion to ensure a finalized job has its “end_timestamp” attribute set. Unfortunately it didn't cover a case when the queue is recovering from an unclean master shutdown. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 11, 2011
-
-
Michael Hanselmann authored
Commit 009e73d0 (September 2009) changed the job queue to generate multiple job serials at once. Ever since it would return one more than requested. The “serial” file in the job queue directory is defined to contain the “last job ID used” (design-2.0). With the change above, the serial file would always contain the next serial number. The first value returned by the generating function was the one contained in the file, so during the switch in 2009 one job may have been overwritten. This patch changes the code to always return the exact number of serials, to keep the last used serial on disk and adds an assertion. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 10, 2011
-
-
Michael Hanselmann authored
Chained jobs need to look at previous jobs, including archived ones. A nice side-effect of this change is the ability to look at archived jobs using “gnt-job info <id>” as long as the ID is known. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 31, 2011
-
-
Michael Hanselmann authored
When a job was cancelled, its status would be changed and the file written again. Since this was a final status, the job file could be moved anytime for archival. If the job was still in the queue, however, it would be processed (not fully, just updating the “end_timestamp” attribute) and written again. This was bad as it could leave the same job in two different files. With this patch the processor is changed to return early for finished jobs. Cancelling a queued job will finalize it right away. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 10, 2011
-
-
Michael Hanselmann authored
With this patch, the worker thread name is updated to include a short summary of the opcode (basically its OP_ID). The base name of job queue threads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor will show a job verifying the cluster as e.g. “Jq2/Job1742/C_VERIFY”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 25, 2011
-
-
Michael Hanselmann authored
The design details can be seen in the design document (doc/design-lu-generated-jobs.rst). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 23, 2011
-
-
Michael Hanselmann authored
Requested-by:
Iustin Pop <iustin@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 28, 2011
-
-
Michael Hanselmann authored
- Move functions for drain status (tracked via file) from jqueue to jstore - Undrain queue on master failover if necessary - Add QA test Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 29, 2010
-
-
Michael Hanselmann authored
Since the recent change to leave jobs in the “waitlock” status (commit 5fd6b694), cancelling a job while it's back in the queue would break. This patch handles these cases and adds a unittest. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 15, 2010
-
-
Michael Hanselmann authored
Iustin Pop reported that a job's file is updated many times while it waits for locks held by other thread(s). After an investigation it was concluded that the reason was a design decision for job priorities to return jobs to the “queued” status if they couldn't acquire all locks. Changing a jobs' status or priority requires an update to permanent storage. In a high-level view this is what happens: 1. Mark as waitlock 2. Write to disk as permanent storage (jobs left in this state by a crashing master daemon are resumed on restart) 3. Wait for lock (assume lock is held by another thread) 4. Mark as queued 5. Write to disk again 6. Return to workerpool Another option originally discussed was to leave the job in the “waitlock” status. Ignoring priority changes, this is what would happen: 1. If not in waitlock 1.1. Assert state == queued 1.2. Mark as waitlock 1.3. Set start_timestamp 1.4. Write to disk as permanent storage 3. Wait for locks (assume lock is held by another thread) 4. Leave in waitlock 5. Return to workerpool Now let's assume the lock is released by the other thread: […] 3. Wait for locks and get them 4. Assert state == waitlock 5. Set state to running 6. Set exec_timestamp 7. Write to disk As this change reduces the number of writes from two per lock acquire attempt to two per opcode and one per priority increase (as happens after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until the highest priority is reached), here's the patch to implement it. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 12, 2010
-
-
Michael Hanselmann authored
If a job was cancelled while it was waiting for locks, an assertion would've failed. This patch fixes the problem and provides a unit test to check for this situation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Commit 5ef699a0 had to roll back an earlier attempt at implementing this. With the improved job queue processer, this is finally possible. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
These fields can help with debugging. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 07, 2010
-
-
Michael Hanselmann authored
This simplifies the code a bit--the status is only checked once. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Sep 24, 2010
-
-
Michael Hanselmann authored
epydoc complained: “File …/ganeti/jqueue.py, line 886, in ganeti.jqueue._JobProcessor._MarkWaitlock Warning: Redefinition of type for job” Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-