Commits · 6a3166cb3a5699ec6299f4997f4fbd27669da500 · itminedu / snf-ganeti

Mar 30, 2012

Add job support to query2 via LUXI · e07f7f7a

Michael Hanselmann authored 13 years ago


This enables the use of filters through query2 when listing jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e07f7f7a

jqueue: Cache prepared field list in _JobChangesChecker · dc2879ea

Michael Hanselmann authored 13 years ago


… instead of re-calculating it on every file change.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

dc2879ea

jqueue: Convert GetInfo to query2 · a06c6ae8

Michael Hanselmann authored 13 years ago


This rather inefficient implementation (fields are evaluated on every
call to GetInfo) is not good for WaitForJobChanges and doesn't support
filters, but that will be rectified in later patches.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a06c6ae8

jqueue._QueuedOpCode: Change a docstring · 66abb9ff

Michael Hanselmann authored 13 years ago


There was a typo and it's not necessary to repeat the class name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

66abb9ff

Dec 22, 2011

jqueue: Factorize checking job processor's result · df5a5730

Michael Hanselmann authored 13 years ago


This allows for more unittesting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

df5a5730

Dec 21, 2011

jqueue: Fix epylint errors introduced in 37d76f1e · 1316ebc2
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
1316ebc2

serializer: Remove JSON indentation and dict key sorting · a182a3ed

Michael Hanselmann authored 13 years ago

Serializing to JSON using “simplejson” is significantly slower when
indentation and/or sorting of dictionary keys is used. In simplejson 1.x
the difference isn't that big, but with simplejson 2.x the difference
can be up to a factor of 7.5. The reason is that the latter no longer
uses C functions when sorting or indentation is used.

With this patch we revert everything to simplejson's defaults, which
should provide us with the best performance available.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a182a3ed

jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e

Michael Hanselmann authored 13 years ago


When an opcode is about to be processed its dependencies are
evaluated using “_JobDependencyManager.CheckAndRegister”. Due
to its nature that function requires a lock on the manager's
internal structures. All of this happens while the job queue
lock is held in shared mode (required for the job processor).

When a job has been processed any pending dependencies are re-added
to the job workerpool. Before this patch that would require
the manager's lock and then, for adding the jobs, the job queue
lock. Since this is in reverse order it will lead to deadlocks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

37d76f1e

Nov 21, 2011

jqueue: Add code to prepare for queue shutdown · 6d5ea385

Michael Hanselmann authored 13 years ago


Doing so will prevent job submissions (similar to a drained queue),
but won't affect currently running jobs. No further jobs will be
executed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6d5ea385

Nov 17, 2011

jqueue: Factorize code checking for drained queue · c8d0be94

Michael Hanselmann authored 13 years ago


This is in preparation for a clean(er) shutdown of masterd.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c8d0be94

Oct 27, 2011

jqueue: Allow zero jobs to be submitted at once · 719f8fba

Michael Hanselmann authored 13 years ago


If cmdlib.LUNodeMigrate was called for a node without primary instances
it would try to submit an empty list of jobs. This was never visible via
CLI as there we check the list of primary instances first.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

719f8fba

Oct 26, 2011

Convert job queue's RPC to generated code · fb1ffbca

Michael Hanselmann authored 13 years ago


With these changes job queue RPC will finally show up on the lock
monitor. See below for an example. A job queue-specific class is used to
restrict the use of a static list for name resolution to the job queue.
Further improvements can be made to not re-create the whole RPC client
for every call (e.g. by using a more dynamic resolver), but for now this
works.

rpc/node8.example.com/jobqueue_update Jq8/Job9/TEST_DELAY

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fb1ffbca

Sep 06, 2011

Fix assertion error on unclean master shutdown · fd121c8e

Michael Hanselmann authored 13 years ago


Commit 66bd7445 added an assertion to ensure a finalized job has its
“end_timestamp” attribute set. Unfortunately it didn't cover a case when
the queue is recovering from an unclean master shutdown.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
(cherry picked from commit 45df0793)

fd121c8e

Aug 30, 2011

Fixes to errors/warnings raised by pylint 0.24 · 17385bd2

Andrea Spadaccini authored 13 years ago


Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how I
fixed them:

* jqueue.py: silenced E1101
* netutils.py: rewrote the list comprehension using extend()
* watcher/__init__.py: fixed a missing format string parameter

These changes are backwards-compatible with pylint 0.21.1.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

17385bd2

DeprecationWarning fixes for pylint · b459a848

Andrea Spadaccini authored 13 years ago


In version 0.21, pylint unified all the disable-* (and enable-*)
directives to disable (resp. enable). This leads to a lot of
DeprecationWarning being emitted even if one uses the recommended
version of pylint (0.21.1, as stated in devnotes.rst).

This commit changes all the disable-msg directives to disable.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b459a848

Aug 19, 2011

ensure-dirs: Set permissions on job files in queue · cb66225d

Michael Hanselmann authored 13 years ago


This was a regression from 2.4.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

cb66225d

Aug 02, 2011

jqueue: Add short delay before detecting job changes · dfc8824a

Michael Hanselmann authored 13 years ago


By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).

This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dfc8824a

Jul 21, 2011

Export job dependencies through lock monitor · fcb21ad7

Michael Hanselmann authored 13 years ago


This makes them visible to the user. Example:

$ gnt-debug locks -o name,pending
Name    Pending
job/890 job:891,892
job/892 job:894

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fcb21ad7

Rename *_STATUS_WAITLOCK to …_WAITING · 47099cd1

Michael Hanselmann authored 13 years ago


This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to
{JOB,OP}_STATUS_WAITING, as per design document for chained jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

47099cd1

Fix locking issue with job dependencies · 75d81fc8

Michael Hanselmann authored 13 years ago

When jobs waiting for a dependency are notified, they're re-added to the
queue. This would require owning the queue lock in exclusive mode, but
since the function doing so is called from within the job/opcode
processor, it only holds the lock in shared mode.

This patch changes the result of the processor from a boolean to a
status value (integer). This way the caller can be notified about
actions to take, including notifying waiting jobs. The function adding
jobs to the queue can now acquire the lock in exclusive mode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

75d81fc8

jqueue: Read-only jobs don't need processor lock · f8a4adfa
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
f8a4adfa

jqueue: Implement submitting multiple jobs with dependencies · b247c6fc

Michael Hanselmann authored 13 years ago


With this change users of the “SubmitManyJobs” interface can use
relative job dependencies. Relative job IDs in dependencies are resolved
before handing the job off to the workerpool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b247c6fc

Jul 20, 2011

jqueue: Add “writable” flag to memory objects · c0f6d0d8

Michael Hanselmann authored 13 years ago


Basically only one instance of the job, the one being processed,
should be serialized to disk and replicated to other nodes. With
this flag assertions can be added in various places.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c0f6d0d8

Implement chained jobs · b95479a5

Michael Hanselmann authored 14 years ago


An overview is available in the design document for this change,
doc/design-chained-jobs.rst.

When a job enters the job processor, the current opcode's dependencies
are evaluated. If a referenced job has not yet reached the desired
status, the current job is registered as a dependant. The job processor
will continue to work on other pending tasks. When a job finishes it
notifies any pending dependants by re-adding them to the workerpool.

A per-job processor lock is necessary for rare cases where the same job
can be re-added twice.

There is no way to view waiting jobs at the moment, but I plan to
export this information to “gnt-debug locks”.

A so-called dependency manager takes care of managing waiting jobs and
keeping track of their status.

Unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b95479a5

Jul 15, 2011

Fix assertion error on unclean master shutdown · 45df0793

Michael Hanselmann authored 13 years ago


Commit 66bd7445 added an assertion to ensure a finalized job has its
“end_timestamp” attribute set. Unfortunately it didn't cover a case when
the queue is recovering from an unclean master shutdown.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

45df0793

Jul 11, 2011

Fix off-by-one bug in job serial generation · 3c88bf36

Michael Hanselmann authored 13 years ago


Commit 009e73d0 (September 2009) changed the job queue to generate
multiple job serials at once. Ever since it would return one more than
requested.

The “serial” file in the job queue directory is defined to contain the
“last job ID used” (design-2.0). With the change above, the serial file
would always contain the next serial number. The first value returned by
the generating function was the one contained in the file, so during the
switch in 2009 one job may have been overwritten.

This patch changes the code to always return the exact number of
serials, to keep the last used serial on disk and adds an assertion.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3c88bf36

Jun 10, 2011

jqueue: Allow loading of archived jobs · 194c8ca4

Michael Hanselmann authored 14 years ago


Chained jobs need to look at previous jobs, including archived ones. A
nice side-effect of this change is the ability to look at archived jobs
using “gnt-job info <id>” as long as the ID is known.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

194c8ca4

May 31, 2011

jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445

Michael Hanselmann authored 14 years ago


When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.

With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

66bd7445

May 10, 2011

jqueue: Update worker thread name to include opcode summary · 0aeeb6e3

Michael Hanselmann authored 14 years ago

With this patch, the worker thread name is updated to include a short
summary of the opcode (basically its OP_ID). The base name of job queue
threads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor
will show a job verifying the cluster as e.g. “Jq2/Job1742/C_VERIFY”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0aeeb6e3

Mar 25, 2011

Implement submitting jobs from logical units · 6a373640

Michael Hanselmann authored 14 years ago


The design details can be seen in the design document
(doc/design-lu-generated-jobs.rst).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

6a373640

Mar 23, 2011

Add opcode summary to SubmitManyJobs errors · 98ed5092

Michael Hanselmann authored 14 years ago


Requested-by: Iustin Pop <iustin@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

98ed5092

Feb 28, 2011

gnt-cluster master-failover: Undrain queue · ff699aa9

Michael Hanselmann authored 14 years ago


- Move functions for drain status (tracked via file) from jqueue to jstore
- Undrain queue on master failover if necessary
- Add QA test

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ff699aa9

Dec 29, 2010

jqueue: Fix cancelling while in waitlock in queue · 30c945d0

Michael Hanselmann authored 14 years ago


Since the recent change to leave jobs in the “waitlock” status (commit
5fd6b694), cancelling a job while it's back in the queue would break.
This patch handles these cases and adds a unittest.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

30c945d0

Dec 15, 2010

jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694

Michael Hanselmann authored 14 years ago


Iustin Pop reported that a job's file is updated many times while it
waits for locks held by other thread(s). After an investigation it was
concluded that the reason was a design decision for job priorities to
return jobs to the “queued” status if they couldn't acquire all locks.
Changing a jobs' status or priority requires an update to permanent
storage.

In a high-level view this is what happens:
1. Mark as waitlock
2. Write to disk as permanent storage (jobs left in this state by a
   crashing master daemon are resumed on restart)
3. Wait for lock (assume lock is held by another thread)
4. Mark as queued
5. Write to disk again
6. Return to workerpool

Another option originally discussed was to leave the job in the
“waitlock” status. Ignoring priority changes, this is what would happen:
1. If not in waitlock
1.1. Assert state == queued
1.2. Mark as waitlock
1.3. Set start_timestamp
1.4. Write to disk as permanent storage
3. Wait for locks (assume lock is held by another thread)
4. Leave in waitlock
5. Return to workerpool

Now let's assume the lock is released by the other thread:
[…]
3. Wait for locks and get them
4. Assert state == waitlock
5. Set state to running
6. Set exec_timestamp
7. Write to disk

As this change reduces the number of writes from two per lock acquire
attempt to two per opcode and one per priority increase (as happens
after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
the highest priority is reached), here's the patch to implement it.
Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5fd6b694

Oct 12, 2010

jqueue: Fix bug when cancelling jobs · 9e49dfc5

Michael Hanselmann authored 14 years ago


If a job was cancelled while it was waiting for locks, an assertion
would've failed. This patch fixes the problem and provides a unit
test to check for this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9e49dfc5

jqueue: Resume jobs from “waitlock” status (2nd try) · 320d1daf

Michael Hanselmann authored 14 years ago


Commit 5ef699a0 had to roll back an earlier attempt at implementing
this. With the improved job queue processer, this is finally possible.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

320d1daf

jqueue/gnt-job: Add job priority fields for display · b8802cc4

Michael Hanselmann authored 14 years ago


These fields can help with debugging.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b8802cc4

Oct 07, 2010

jqueue, CancelJob: Check status only once per call · 86b16e9d

Michael Hanselmann authored 14 years ago


This simplifies the code a bit--the status is only checked once.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

86b16e9d

Sep 24, 2010

Fix docstring typo in jqueue._JobProcessor._MarkWaitlock · a38e8674

Michael Hanselmann authored 14 years ago


epydoc complained:
“File …/ganeti/jqueue.py, line 886, in
ganeti.jqueue._JobProcessor._MarkWaitlock
  Warning: Redefinition of type for job”

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a38e8674

jqueue: Use priority for acquiring locks · f23db633

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f23db633