Commits · bb92166816b51f69e6e22e32fddbff49d9a32729 · itminedu / snf-ganeti

Oct 25, 2012

jqueue: Add docstring for _DetermineJobDirectories · bb921668

Michael Hanselmann authored 12 years ago


Somehow this was missed in commit 0422250e.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

bb921668

jqueue: Fix comments in _SubmitJobUnlocked · 42d49574

Michael Hanselmann authored 12 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

42d49574

Oct 11, 2012

gnt-job: List archived jobs if requested · 0422250e

Michael Hanselmann authored 12 years ago


If requested via a filter or by including the “archived” output,
archived jobs will be loaded and shown. This is significantly slower
than just listing normal jobs, therefore by default they are not loaded
at all.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0422250e

jqueue: Add new in-memory attribute for archived jobs · 8a3cd185

Michael Hanselmann authored 12 years ago


This attribute is set to True for jobs which were restored from an
archived file. A new filter will act on this field.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8a3cd185

jqueue: Correct docstring · 4c27b231

Michael Hanselmann authored 12 years ago


The description was not accurate.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4c27b231

Oct 05, 2012

jqueue: Look at archived jobs when watching · e4cf42d4

Michael Hanselmann authored 12 years ago


First: This enables the use of “gnt-job watch $id” for archived jobs.

Now, the reason for actually making this work is that during
sufficiently large group or node evacuations jobs are archived before
the client gets to poll for their output. This led to situations where
the jobs would finish successfully, but the client reported an error
because it couldn't see the job anymore.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
(cherry picked from commit 04569469)

e4cf42d4

jqueue: Look at archived jobs when watching · 04569469

Michael Hanselmann authored 12 years ago


First: This enables the use of “gnt-job watch $id” for archived jobs.

Now, the reason for actually making this work is that during
sufficiently large group or node evacuations jobs are archived before
the client gets to poll for their output. This led to situations where
the jobs would finish successfully, but the client reported an error
because it couldn't see the job anymore.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

04569469

Sep 25, 2012

Implement virtual cluster support in Python code · cffbbae7

Michael Hanselmann authored 12 years ago


- pathutils: Prepend node-specific prefix path
- RPC: Use virtual paths (see vcluster.py)
- SSH: Pass environment variables, use destination's node directory when
  copying files using scp, use GANETI_HOSTNAME to determine hostname

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

cffbbae7

Sep 18, 2012

Migrate lib/{jqueue,jstore}.py from constants to pathutils · e2b4a7ba

Michael Hanselmann authored 12 years ago


File system paths moved from constants to pathutils.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e2b4a7ba

Aug 07, 2012

Switch job IDs to numeric · 76b62028

Iustin Pop authored 12 years ago


This has been a long-standing cleanup item, which we've always
refrained from doing due to the high estimated effort needed.

In reality, it turned out that after some infrastructure improvements
(the previous patches), the actual job queue-related changes are quite
small.

We will need to update the NEWS file later, but so far the RAPI
documentation doesn't mention that the job ID is a string (it only
says it is "a number"), so it doesn't look like it needs update.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

76b62028

Jun 15, 2012

jqueue: Move functions related to job ID to jstore · 1410a389

Michael Hanselmann authored 13 years ago


These don't really need to be in jqueue, and a new function will
be added to convert job IDs to an integer for queries.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

1410a389

Mar 30, 2012

Add job support to query2 via LUXI · e07f7f7a

Michael Hanselmann authored 13 years ago


This enables the use of filters through query2 when listing jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e07f7f7a

jqueue: Cache prepared field list in _JobChangesChecker · dc2879ea

Michael Hanselmann authored 13 years ago


… instead of re-calculating it on every file change.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

dc2879ea

jqueue: Convert GetInfo to query2 · a06c6ae8

Michael Hanselmann authored 13 years ago


This rather inefficient implementation (fields are evaluated on every
call to GetInfo) is not good for WaitForJobChanges and doesn't support
filters, but that will be rectified in later patches.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a06c6ae8

jqueue._QueuedOpCode: Change a docstring · 66abb9ff

Michael Hanselmann authored 13 years ago


There was a typo and it's not necessary to repeat the class name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

66abb9ff

Dec 22, 2011

jqueue: Factorize checking job processor's result · df5a5730

Michael Hanselmann authored 13 years ago


This allows for more unittesting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

df5a5730

Dec 21, 2011

jqueue: Fix epylint errors introduced in 37d76f1e · 1316ebc2
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
1316ebc2

serializer: Remove JSON indentation and dict key sorting · a182a3ed

Michael Hanselmann authored 13 years ago

Serializing to JSON using “simplejson” is significantly slower when
indentation and/or sorting of dictionary keys is used. In simplejson 1.x
the difference isn't that big, but with simplejson 2.x the difference
can be up to a factor of 7.5. The reason is that the latter no longer
uses C functions when sorting or indentation is used.

With this patch we revert everything to simplejson's defaults, which
should provide us with the best performance available.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a182a3ed

jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e

Michael Hanselmann authored 13 years ago


When an opcode is about to be processed its dependencies are
evaluated using “_JobDependencyManager.CheckAndRegister”. Due
to its nature that function requires a lock on the manager's
internal structures. All of this happens while the job queue
lock is held in shared mode (required for the job processor).

When a job has been processed any pending dependencies are re-added
to the job workerpool. Before this patch that would require
the manager's lock and then, for adding the jobs, the job queue
lock. Since this is in reverse order it will lead to deadlocks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

37d76f1e

Nov 21, 2011

jqueue: Add code to prepare for queue shutdown · 6d5ea385

Michael Hanselmann authored 13 years ago


Doing so will prevent job submissions (similar to a drained queue),
but won't affect currently running jobs. No further jobs will be
executed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6d5ea385

Nov 17, 2011

jqueue: Factorize code checking for drained queue · c8d0be94

Michael Hanselmann authored 13 years ago


This is in preparation for a clean(er) shutdown of masterd.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c8d0be94

Oct 27, 2011

jqueue: Allow zero jobs to be submitted at once · 719f8fba

Michael Hanselmann authored 13 years ago


If cmdlib.LUNodeMigrate was called for a node without primary instances
it would try to submit an empty list of jobs. This was never visible via
CLI as there we check the list of primary instances first.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

719f8fba

Oct 26, 2011

Convert job queue's RPC to generated code · fb1ffbca

Michael Hanselmann authored 13 years ago


With these changes job queue RPC will finally show up on the lock
monitor. See below for an example. A job queue-specific class is used to
restrict the use of a static list for name resolution to the job queue.
Further improvements can be made to not re-create the whole RPC client
for every call (e.g. by using a more dynamic resolver), but for now this
works.

rpc/node8.example.com/jobqueue_update Jq8/Job9/TEST_DELAY

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fb1ffbca

Sep 06, 2011

Fix assertion error on unclean master shutdown · fd121c8e

Michael Hanselmann authored 14 years ago


Commit 66bd7445 added an assertion to ensure a finalized job has its
“end_timestamp” attribute set. Unfortunately it didn't cover a case when
the queue is recovering from an unclean master shutdown.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit 45df0793)

fd121c8e

Aug 30, 2011

Fixes to errors/warnings raised by pylint 0.24 · 17385bd2

Andrea Spadaccini authored 13 years ago


Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how I
fixed them:

* jqueue.py: silenced E1101
* netutils.py: rewrote the list comprehension using extend()
* watcher/__init__.py: fixed a missing format string parameter

These changes are backwards-compatible with pylint 0.21.1.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

17385bd2

DeprecationWarning fixes for pylint · b459a848

Andrea Spadaccini authored 13 years ago


In version 0.21, pylint unified all the disable-* (and enable-*)
directives to disable (resp. enable). This leads to a lot of
DeprecationWarning being emitted even if one uses the recommended
version of pylint (0.21.1, as stated in devnotes.rst).

This commit changes all the disable-msg directives to disable.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b459a848

Aug 19, 2011

ensure-dirs: Set permissions on job files in queue · cb66225d

Michael Hanselmann authored 13 years ago


This was a regression from 2.4.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

cb66225d

Aug 02, 2011

jqueue: Add short delay before detecting job changes · dfc8824a

Michael Hanselmann authored 13 years ago


By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).

This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dfc8824a

Jul 21, 2011

Export job dependencies through lock monitor · fcb21ad7

Michael Hanselmann authored 14 years ago


This makes them visible to the user. Example:

$ gnt-debug locks -o name,pending
Name    Pending
job/890 job:891,892
job/892 job:894

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fcb21ad7

Rename *_STATUS_WAITLOCK to …_WAITING · 47099cd1

Michael Hanselmann authored 14 years ago


This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to
{JOB,OP}_STATUS_WAITING, as per design document for chained jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

47099cd1

Fix locking issue with job dependencies · 75d81fc8

Michael Hanselmann authored 14 years ago

When jobs waiting for a dependency are notified, they're re-added to the
queue. This would require owning the queue lock in exclusive mode, but
since the function doing so is called from within the job/opcode
processor, it only holds the lock in shared mode.

This patch changes the result of the processor from a boolean to a
status value (integer). This way the caller can be notified about
actions to take, including notifying waiting jobs. The function adding
jobs to the queue can now acquire the lock in exclusive mode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

75d81fc8

jqueue: Read-only jobs don't need processor lock · f8a4adfa
Michael Hanselmann authored 14 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
f8a4adfa

jqueue: Implement submitting multiple jobs with dependencies · b247c6fc

Michael Hanselmann authored 14 years ago


With this change users of the “SubmitManyJobs” interface can use
relative job dependencies. Relative job IDs in dependencies are resolved
before handing the job off to the workerpool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b247c6fc

Jul 20, 2011

jqueue: Add “writable” flag to memory objects · c0f6d0d8

Michael Hanselmann authored 14 years ago


Basically only one instance of the job, the one being processed,
should be serialized to disk and replicated to other nodes. With
this flag assertions can be added in various places.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c0f6d0d8

Implement chained jobs · b95479a5

Michael Hanselmann authored 14 years ago


An overview is available in the design document for this change,
doc/design-chained-jobs.rst.

When a job enters the job processor, the current opcode's dependencies
are evaluated. If a referenced job has not yet reached the desired
status, the current job is registered as a dependant. The job processor
will continue to work on other pending tasks. When a job finishes it
notifies any pending dependants by re-adding them to the workerpool.

A per-job processor lock is necessary for rare cases where the same job
can be re-added twice.

There is no way to view waiting jobs at the moment, but I plan to
export this information to “gnt-debug locks”.

A so-called dependency manager takes care of managing waiting jobs and
keeping track of their status.

Unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b95479a5

Jul 15, 2011

Fix assertion error on unclean master shutdown · 45df0793

Michael Hanselmann authored 14 years ago


Commit 66bd7445 added an assertion to ensure a finalized job has its
“end_timestamp” attribute set. Unfortunately it didn't cover a case when
the queue is recovering from an unclean master shutdown.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

45df0793

Jul 11, 2011

Fix off-by-one bug in job serial generation · 3c88bf36

Michael Hanselmann authored 14 years ago


Commit 009e73d0 (September 2009) changed the job queue to generate
multiple job serials at once. Ever since it would return one more than
requested.

The “serial” file in the job queue directory is defined to contain the
“last job ID used” (design-2.0). With the change above, the serial file
would always contain the next serial number. The first value returned by
the generating function was the one contained in the file, so during the
switch in 2009 one job may have been overwritten.

This patch changes the code to always return the exact number of
serials, to keep the last used serial on disk and adds an assertion.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3c88bf36

Jun 10, 2011

jqueue: Allow loading of archived jobs · 194c8ca4

Michael Hanselmann authored 14 years ago


Chained jobs need to look at previous jobs, including archived ones. A
nice side-effect of this change is the ability to look at archived jobs
using “gnt-job info <id>” as long as the ID is known.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

194c8ca4

May 31, 2011

jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445

Michael Hanselmann authored 14 years ago


When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.

With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

66bd7445

May 10, 2011

jqueue: Update worker thread name to include opcode summary · 0aeeb6e3

Michael Hanselmann authored 14 years ago

With this patch, the worker thread name is updated to include a short
summary of the opcode (basically its OP_ID). The base name of job queue
threads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor
will show a job verifying the cluster as e.g. “Jq2/Job1742/C_VERIFY”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0aeeb6e3