- Oct 25, 2012
-
-
Michael Hanselmann authored
Somehow this was missed in commit 0422250e. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Helga Velroyen <helgav@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Helga Velroyen <helgav@google.com>
-
- Oct 11, 2012
-
-
Michael Hanselmann authored
If requested via a filter or by including the “archived” output, archived jobs will be loaded and shown. This is significantly slower than just listing normal jobs, therefore by default they are not loaded at all. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This attribute is set to True for jobs which were restored from an archived file. A new filter will act on this field. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The description was not accurate. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 05, 2012
-
-
Michael Hanselmann authored
First: This enables the use of “gnt-job watch $id” for archived jobs. Now, the reason for actually making this work is that during sufficiently large group or node evacuations jobs are archived before the client gets to poll for their output. This led to situations where the jobs would finish successfully, but the client reported an error because it couldn't see the job anymore. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com> (cherry picked from commit 04569469)
-
Michael Hanselmann authored
First: This enables the use of “gnt-job watch $id” for archived jobs. Now, the reason for actually making this work is that during sufficiently large group or node evacuations jobs are archived before the client gets to poll for their output. This led to situations where the jobs would finish successfully, but the client reported an error because it couldn't see the job anymore. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Sep 25, 2012
-
-
Michael Hanselmann authored
- pathutils: Prepend node-specific prefix path - RPC: Use virtual paths (see vcluster.py) - SSH: Pass environment variables, use destination's node directory when copying files using scp, use GANETI_HOSTNAME to determine hostname Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Sep 18, 2012
-
-
Michael Hanselmann authored
File system paths moved from constants to pathutils. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 07, 2012
-
-
Iustin Pop authored
This has been a long-standing cleanup item, which we've always refrained from doing due to the high estimated effort needed. In reality, it turned out that after some infrastructure improvements (the previous patches), the actual job queue-related changes are quite small. We will need to update the NEWS file later, but so far the RAPI documentation doesn't mention that the job ID is a string (it only says it is "a number"), so it doesn't look like it needs update. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jun 15, 2012
-
-
Michael Hanselmann authored
These don't really need to be in jqueue, and a new function will be added to convert job IDs to an integer for queries. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 30, 2012
-
-
Michael Hanselmann authored
This enables the use of filters through query2 when listing jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
… instead of re-calculating it on every file change. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This rather inefficient implementation (fields are evaluated on every call to GetInfo) is not good for WaitForJobChanges and doesn't support filters, but that will be rectified in later patches. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
There was a typo and it's not necessary to repeat the class name. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 22, 2011
-
-
Michael Hanselmann authored
This allows for more unittesting. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Serializing to JSON using “simplejson” is significantly slower when indentation and/or sorting of dictionary keys is used. In simplejson 1.x the difference isn't that big, but with simplejson 2.x the difference can be up to a factor of 7.5. The reason is that the latter no longer uses C functions when sorting or indentation is used. With this patch we revert everything to simplejson's defaults, which should provide us with the best performance available. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When an opcode is about to be processed its dependencies are evaluated using “_JobDependencyManager.CheckAndRegister”. Due to its nature that function requires a lock on the manager's internal structures. All of this happens while the job queue lock is held in shared mode (required for the job processor). When a job has been processed any pending dependencies are re-added to the job workerpool. Before this patch that would require the manager's lock and then, for adding the jobs, the job queue lock. Since this is in reverse order it will lead to deadlocks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 21, 2011
-
-
Michael Hanselmann authored
Doing so will prevent job submissions (similar to a drained queue), but won't affect currently running jobs. No further jobs will be executed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 17, 2011
-
-
Michael Hanselmann authored
This is in preparation for a clean(er) shutdown of masterd. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 27, 2011
-
-
Michael Hanselmann authored
If cmdlib.LUNodeMigrate was called for a node without primary instances it would try to submit an empty list of jobs. This was never visible via CLI as there we check the list of primary instances first. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Oct 26, 2011
-
-
Michael Hanselmann authored
With these changes job queue RPC will finally show up on the lock monitor. See below for an example. A job queue-specific class is used to restrict the use of a static list for name resolution to the job queue. Further improvements can be made to not re-create the whole RPC client for every call (e.g. by using a more dynamic resolver), but for now this works. rpc/node8.example.com/jobqueue_update Jq8/Job9/TEST_DELAY Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Sep 06, 2011
-
-
Michael Hanselmann authored
Commit 66bd7445 added an assertion to ensure a finalized job has its “end_timestamp” attribute set. Unfortunately it didn't cover a case when the queue is recovering from an unclean master shutdown. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> (cherry picked from commit 45df0793)
-
- Aug 30, 2011
-
-
Andrea Spadaccini authored
Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how I fixed them: * jqueue.py: silenced E1101 * netutils.py: rewrote the list comprehension using extend() * watcher/__init__.py: fixed a missing format string parameter These changes are backwards-compatible with pylint 0.21.1. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
In version 0.21, pylint unified all the disable-* (and enable-*) directives to disable (resp. enable). This leads to a lot of DeprecationWarning being emitted even if one uses the recommended version of pylint (0.21.1, as stated in devnotes.rst). This commit changes all the disable-msg directives to disable. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 19, 2011
-
-
Michael Hanselmann authored
This was a regression from 2.4. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 02, 2011
-
-
Michael Hanselmann authored
By sleeping for 100ms after receiving a notification for a changed job file the job is given some additional time to change again. This significantly reduces the number of LUXI calls for WaitForJobChanges (depending on the job, in my tests with “gnt-cluster verify --debug-simulate-errors” by about 80%), and improves performance (the same job went from around 7 seconds to around 3.5 seconds). This method is not perfect. The algorithm could be made more complex, e.g. by increasing the delay on each change, etc., but for now this simple change provides a good improvement. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 21, 2011
-
-
Michael Hanselmann authored
This makes them visible to the user. Example: $ gnt-debug locks -o name,pending Name Pending job/890 job:891,892 job/892 job:894 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When jobs waiting for a dependency are notified, they're re-added to the queue. This would require owning the queue lock in exclusive mode, but since the function doing so is called from within the job/opcode processor, it only holds the lock in shared mode. This patch changes the result of the processor from a boolean to a status value (integer). This way the caller can be notified about actions to take, including notifying waiting jobs. The function adding jobs to the queue can now acquire the lock in exclusive mode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
With this change users of the “SubmitManyJobs” interface can use relative job dependencies. Relative job IDs in dependencies are resolved before handing the job off to the workerpool. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 20, 2011
-
-
Michael Hanselmann authored
Basically only one instance of the job, the one being processed, should be serialized to disk and replicated to other nodes. With this flag assertions can be added in various places. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
An overview is available in the design document for this change, doc/design-chained-jobs.rst. When a job enters the job processor, the current opcode's dependencies are evaluated. If a referenced job has not yet reached the desired status, the current job is registered as a dependant. The job processor will continue to work on other pending tasks. When a job finishes it notifies any pending dependants by re-adding them to the workerpool. A per-job processor lock is necessary for rare cases where the same job can be re-added twice. There is no way to view waiting jobs at the moment, but I plan to export this information to “gnt-debug locks”. A so-called dependency manager takes care of managing waiting jobs and keeping track of their status. Unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 15, 2011
-
-
Michael Hanselmann authored
Commit 66bd7445 added an assertion to ensure a finalized job has its “end_timestamp” attribute set. Unfortunately it didn't cover a case when the queue is recovering from an unclean master shutdown. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 11, 2011
-
-
Michael Hanselmann authored
Commit 009e73d0 (September 2009) changed the job queue to generate multiple job serials at once. Ever since it would return one more than requested. The “serial” file in the job queue directory is defined to contain the “last job ID used” (design-2.0). With the change above, the serial file would always contain the next serial number. The first value returned by the generating function was the one contained in the file, so during the switch in 2009 one job may have been overwritten. This patch changes the code to always return the exact number of serials, to keep the last used serial on disk and adds an assertion. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 10, 2011
-
-
Michael Hanselmann authored
Chained jobs need to look at previous jobs, including archived ones. A nice side-effect of this change is the ability to look at archived jobs using “gnt-job info <id>” as long as the ID is known. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 31, 2011
-
-
Michael Hanselmann authored
When a job was cancelled, its status would be changed and the file written again. Since this was a final status, the job file could be moved anytime for archival. If the job was still in the queue, however, it would be processed (not fully, just updating the “end_timestamp” attribute) and written again. This was bad as it could leave the same job in two different files. With this patch the processor is changed to return early for finished jobs. Cancelling a queued job will finalize it right away. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 10, 2011
-
-
Michael Hanselmann authored
With this patch, the worker thread name is updated to include a short summary of the opcode (basically its OP_ID). The base name of job queue threads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor will show a job verifying the cluster as e.g. “Jq2/Job1742/C_VERIFY”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-