Commits · 4679547e493009eacdc987f36168c03f7e35cca1 · itminedu / snf-ganeti

Nov 13, 2012

jqueue: Allow changing of job priority · 4679547e

Michael Hanselmann authored 12 years ago


This is due to a feature request. Sometimes one wants to change the
priority of a job after it has been submitted, e.g. after submitting an
important job only to later notice many other pending jobs which will be
processed first. Priority changes only take effect at the next lock
acquisition or when the job is re-scheduled.

The design is very similar to how jobs are cancelled.

Unit tests for “_QueuedJob.ChangePriority” are included.

Also rename “TestQueuedJob.test” to “TestQueuedJob.testError”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

4679547e

jqueue: Set task ID for jobs added to workerpool · 99fb250b

Michael Hanselmann authored 12 years ago


The job ID is re-used as the task ID, as job IDs are unique.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

99fb250b

workerpool: Preserve task number when deferring · bba69414

Michael Hanselmann authored 12 years ago


When a task is deferred it should receive the same task ID upon being
returned to the pool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

bba69414

workerpool: Add method to change task's priority · 9a2564e7

Michael Hanselmann authored 12 years ago


Using the task ID a pending task's priority can be changed. This will be
used to change the priority of jobs in the workerpool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

9a2564e7

workerpool: Change data structure for priority change · 125b74b2

Michael Hanselmann authored 12 years ago

To prepare for the addition of a new function allowing changing a
pending task's priority, the internal data structure is slightly
changed. The (optional) task ID is stored as part of the task entry. A
new dictionary provides a mapping from the task ID to its task entry. If
the task ID is None, the entry is not added to the map.

Task entries used to be a tuple, but since modifying the priority
requires changing an entry, they are changed to lists in this patch.
Tuple items can not be modified.

The underlying idea is from [1].

[1]:
http://docs.python.org/library/heapq.html#priority-queue-implementation-notes



Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

125b74b2

Documentation for the NODE_RES level · e02ee261

Helga Velroyen authored 12 years ago


Signed-off-by: Helga Velroyen <helgav@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e02ee261

Nov 12, 2012

RunCmd: Expose "postfork" callback · 09b72783

Michael Hanselmann authored 12 years ago


The “_postfork_fn” parameter was only used for tests until now. To
implement a good locking scheme, remote commands must also make use of
this callback to release a lock when the command was successfully
started (but did not yet finish).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

09b72783

Improve error message when migration status fail · 4041a4e3

Iustin Pop authored 12 years ago


Commit 6a1434d7 (“Make migration RPC non-blocking”) changed the API
for reporting migration status, but has a small cosmetic bug: if the
migration status if failure, but the RPC itself to get the status
didn't fail, it shows the following error message:

  Could not migrate instance instance2: None

since it always uses result.fail_msg, irrespective of which part of
the if condition failed.

This patch simply updates the msg if not already set, leading to:

  Could not migrate instance instance2: hypervisor returned failure

Proper error display can be done once the migration status objects can
return failure information as well, beside status.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Helga Velroyen <helgav@google.com>

4041a4e3

Fix type error in kvm/GetMigrationStatus · 62457f51

Iustin Pop authored 12 years ago


Commit 6a1434d7 (“Make migration RPC non-blocking”) changed from
raising HypervisorErrors to returning MigrationStatus
objects. However, these objects don't have an "info" attribute, so
they can't pass a reason back (which is in itself a bug); but the KVM
hypervisor code attempts to do so, and fails at runtime with:

  Failed to get migration status: 'MigrationStatus' object has no attribute 'info'

instead of the intended:

  Migration failed, aborting: too many broken 'info migrate' answers

For now (on stable-2.6), let's just remove the "info" reason, and
later we can add it back properly once we have a way to correctly
represent migration status failures in the LU.

This fixes issue 297.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

62457f51

Nov 09, 2012

sphinx_ext: Allow use of “rapi” module in pyeval · a12f0ef8

Michael Hanselmann authored 12 years ago


This way constants like “rapi.RAPI_ACCESS_WRITE” can be used in
documentation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a12f0ef8

Nov 08, 2012

rlib2: Document two previously undocumented functions · 2a38e913

Michael Hanselmann authored 12 years ago


Commit 208a6cff just included empty docstrings.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2a38e913

jqueue/mcpu: Determine priority using callback · e4e59de8

Michael Hanselmann authored 12 years ago


Instead of being given the priority for acquiring locks by means of a
parameter, mcpu will now call back. This is in preparation for
implementing a command to change a job's priority on the fly and allows
to change it while locks are being acquired (taking effect on the next
lock acquire).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

e4e59de8

http/__init__.py: Remove extraneous argument · e080072c

Michael Hanselmann authored 12 years ago


pylint complained, I fixed it, and unfortunately pushed too early.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e080072c

rapi.testutils: Add utility to format HTTP headers · 1b8e72f3

Michael Hanselmann authored 12 years ago


Once again this will be used by forthcoming RAPI test.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1b8e72f3

rapi.testutils: Return headers from mock utility · 0351944b

Michael Hanselmann authored 12 years ago


A newly added test for RAPI will also verify the returned headers. A
test in ganeti.rapi.client_unittest.py is split into smaller stand-alone
tests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0351944b

http: Add wrapper for mimetools.Message · 0e632cbd

Michael Hanselmann authored 12 years ago


A newly added piece of code will also have to parse headers, so having
this wrapper saves us from copying this part of code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0e632cbd

Nov 07, 2012

Add missing tests for commit · 7a70541e

Michael Hanselmann authored 12 years ago


Commit f0d22861 changed the logic of
gnt_instance._ConvertNicDiskModifications to also allow a parameter
named “modify”. Unfortunately the corresponding unittest was not
updated. An “if”/“else” condition is also merged.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7a70541e

workerpool: Use itertools.count instead of manual counting · c258f110

Michael Hanselmann authored 12 years ago


Instead of having to explicitely increment the value (“… += 1”), a call
to next() is enough. These numbers should in no case be re-used (they
are used for ordering tasks). Using “itertools.count” is useful here as
it guarantees that a returned number won't be returned another time.
Manual code for this could, over the course of time, gain unintended
bugs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

c258f110

Nov 06, 2012

Use SSH_LOGIN_USER rather than root for xl ssh · f215debf

Guido Trotter authored 12 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

f215debf

Fix gnt-instance console with xl · 1f5557ca

Guido Trotter authored 12 years ago


- Rename xm-console-wrapper to xen-console-wrapper
- Pass the xen command to use as a parameter

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1f5557ca

Add utility to check if file is executable · 10b86782

Michael Hanselmann authored 12 years ago


This replaces direct calls to “os.access” and
“os.path.exists”/“os.path.isfile”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

10b86782

Fix NameError in constants.py introduced in merge 46c1f828 · 55d1ebfa
Michael Hanselmann authored 12 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
55d1ebfa

Disable E1101 on ganeti/http/server.py:424 · 57a6042e

Guido Trotter authored 12 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

57a6042e

Fix live migration under xl · 053c356a

Guido Trotter authored 12 years ago


Until now the only way to make live migration work in conjunction with
"xl" was to add ssh known_hosts keys for every node's secondary ip on
every other node.

With this command we remove the target key verification: this is not
worse than what we were doing before with "xm", and allows the migration
to happen under either toolstack, without extra manual work. Of course
the full security of ssh is not used by live migration, then.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

053c356a

Don't check for xend port when using xl · 3135de69

Guido Trotter authored 12 years ago


If the toolstack is set to "xl" we shouldn't ping xend for liveness
before attempting a live migration.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3135de69

utils.io: Improve handling of double and single slashes · 2826897c

Michael Hanselmann authored 12 years ago


Up until now “IsBelowDir("/", …)” would never return True. The reason
was that an additional slash was added to the root path resulting in
“//", which is “implementation-defined” in posix and treated specially
by “os.path.normpath”.

This patch fixes the behaviour for this special case and adds tests
(also for IsNormAbsPath). A typo in the docstring is fixed. Calls to
“assert_” and “assertFalse” are changed to pass a message by keyword
argument.

It is a bit of a mess, but I hope the resulting behaviour is correct.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Dato Simó <dato@google.com>

2826897c

workerpool: Don't mask variable in AddManyTasks · f94779f5

Michael Hanselmann authored 12 years ago


The name “priority” is already used.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

f94779f5

workerpool: Simplify _WaitForTaskUnlocked · c69c45a7

Michael Hanselmann authored 12 years ago


The function in is simplified in its structure and duplicated checks
have been merged.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

c69c45a7

Nov 05, 2012

cli.py: use None as name for tag operations on the cluster · bcd35e09

Dato Simó authored 12 years ago

This change is mostly cosmetic. Previously, the literal "cluster" was
used for the 'name' field of tag operations on the cluster (as opposed
to a node or an instance). Since this field has a type of TMaybeString
specifically for the case of the cluster, it seems more correct to use
None, rather than an arbitrary string (that is not used by the callee).

Additionally: note in opcodes.py that groups also expect a name; the
previous comment only referred to nodes and instances.

Signed-off-by: Dato Simó <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bcd35e09

Nov 02, 2012

Fix previous merge · c17770c7

Bernardo Dal Seno authored 12 years ago


A call to _CalculateGroupIPolicy wasn't refactored during the merge.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c17770c7

Nov 01, 2012

jqueue: Return jobs to queue when shutting down · 942e2262

Michael Hanselmann authored 12 years ago


When a job is still waiting for locks and the queue is shutting down,
they should be returned and not actually start processing. Until now
jobs which transitioned from “queued” to “waiting” were already
considered to be running as far as the shutdown code was concerned.

This fixes issue 296.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

942e2262

gnt-debug delay: Add "--submit" option · bb600388

Michael Hanselmann authored 12 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bb600388

Make hostname checks uniform between instance rename and add · 233f4bc6

Iustin Pop authored 12 years ago


Currently, we have instance rename doing extra checks on the host
name, to prevent accidental wrong renames; however, instance create
doesn't do these checks (issue 291), which (if DNS is misconfigured)
can lead to hard to diagnose errors.

This patch abstracts the name checking from LUInstanceRename into a
separate function, which is then reused in both instance rename and
instance create.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

233f4bc6

Improve logging of new job submissions · 4c91d2ad

Iustin Pop authored 12 years ago


This addresses issue 290: when receiving new jobs, logging is
incomplete, and we don't have the job ID(s) and/or summaries
logged. Only later, when the job is queried for or being processed, we
know more.

This is not good when troubleshooting, so let's improve the initial
logging.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

4c91d2ad

Improve handling of lock exceptions · a9d40c65

Iustin Pop authored 12 years ago


There are two issues with lock exceptions right now:

- first, we don't log the original error; this is fine for now
  (locking.py always returns the same error here), but in general is
  brittle: if locking.py would start returning more information, we'd
  completely miss that

- second, an actual honest lock conflict is not an internal error;
  it's simply an optimistic lock failing, and as such we should not
  return internal error, but rather resource_not_unique

This addresses issue 287.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

a9d40c65

Oct 30, 2012

Fix runtime memory increases · 0d324688

Iustin Pop authored 12 years ago


Commit 2c0af7da which added the runtime memory changes functionality
had a small typo (wrong name); I've rewritten this to only compute the
delta once, for simplicity.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0d324688

Fix validation of vgname in OpClusterSetParams · 8e66b9bf

Iustin Pop authored 12 years ago


This variable can be empty, when we want to disable LVM, so we can't
use TMaybeString.

Fixes issue 285.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

8e66b9bf

Fix removal of storage directory on shared file storage · e5dfc531

Iustin Pop authored 12 years ago


This patch makes _RemoveDisks symmetric to _CreateDisks with respect
to file-based storage: _CreateDisks uses "in constants.DTS_FILEBASED",
whereas _RemoveDisks was not update and only uses "==
constants.DT_FILE". This results in stale directories left on the
filesystem.

Fixes issue 262.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e5dfc531

Switch non-redundant check to disk template-based · 15361a18

Iustin Pop authored 12 years ago


Currently, the warning/notice about non-redundant instances in cluster
verify is based non empty secondaries list (how old is this?); the
proper way to check this nowadays is via DTS_MIRRORED.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15361a18

Oct 29, 2012

Fix permission for socket directory · efd38c3d

Bernardo Dal Seno authored 12 years ago


The directory must we writable also by the confd daemon user.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

efd38c3d