Commits · b322253772201e824cd5d3c849bc724cb225a236 · itminedu / snf-ganeti

May 11, 2012

Workaround changed LVM behaviour · 4c5dd3ff

Iustin Pop authored 13 years ago


The vgreduce command has changed behaviour from when we initially
wrote the code (2.02.02 versus 2.02.66, 4 years delta):

- if there are LVs which will be impacted, it requires --force
- otherwise refuses to proceed, but it still returns exit code 0

We handle this by looking to see if it returns "Wrote out consistent
volume group" (behaviour unchanged), or if it complains about
"--force"; in the case it didn't complete, we retry the operation.

We improve a bit the checking of "vgs", as it uses to fail silently
and we didn't detect it.

New tests for this function should test, I believe, all the expected
variations; at the least we now have data files with the expected
output.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit 048eeb2b)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4c5dd3ff

May 09, 2012

Fix exception re-raising in Python Luxi clients · 98dfcaff

Iustin Pop authored 13 years ago


Commit e687ec01 (present in 2.5 since the 2.5 beta 3) did consistency
fixes across the code-base. Unfortunately this was done without enough
checks on the actual meaning of one of the fixes, which means error
re-raising in lib/errors.py is broken.

The problem is that:

  raise cls, args

is different than:

  raise cls(args)

And our unit-tests didn't catch this (this patch updates the tests).

This breakage is usually trivial, like wrong error messages:

  $ gnt-instance remove no-such-instance
  Failure: prerequisites not met for this operation:
  ("Instance 'no-such-instance' not known", 'unknown_entity')

versus:

  $ gnt-instance remove no-such-instance
  Failure: prerequisites not met for this operation:
  error type: unknown_entity, error details:
  Instance 'no-such-instance' not known

or:

  $ gnt-instance add … no-such-instance
  Failure: prerequisites not met for this operation:
  ('The given name (no-such-instance) does not resolve: Name or service not known', 'resolver_error')

versus:

  $ gnt-instance add … no-such-instance
  Failure: prerequisites not met for this operation:
  error type: resolver_error, error details:
  The given name (no-such-instance) does not resolve: Name or service not known

But in some cases where we rely on a certain data representation
(e.g. HooksAbort), this actually breaks because we try to iterate over
the wrong type:

  File "/usr/lib/python2.6/dist-packages/ganeti/cli.py", line 1907, in FormatError
     for node, script, out in err.args[0]:
  ValueError: need more than 1 value to unpack

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

98dfcaff

Jan 06, 2012

KVM: support version reported by 1.0 · 585c8187

Guido Trotter authored 13 years ago


This of course was working for all the rcs, but broke with 1.0 itself.

In addition:
  - split between running kvm --version and parsing its output
  - unittest parsing for various known --help outputs
  - updated NEWS file
  - happy 2012 wishes
  - the hope to finish this patch before it's time to say happy easter
    :)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

585c8187

Dec 21, 2011

jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e

Michael Hanselmann authored 13 years ago


When an opcode is about to be processed its dependencies are
evaluated using “_JobDependencyManager.CheckAndRegister”. Due
to its nature that function requires a lock on the manager's
internal structures. All of this happens while the job queue
lock is held in shared mode (required for the job processor).

When a job has been processed any pending dependencies are re-added
to the job workerpool. Before this patch that would require
the manager's lock and then, for adding the jobs, the job queue
lock. Since this is in reverse order it will lead to deadlocks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

37d76f1e

Nov 30, 2011

Add UnescapeAndSplit unittest for multi-escapes · b39d17b1

Iustin Pop authored 13 years ago


This would have caught the bug in the first place. Argh,
hand-generated test cases!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b39d17b1

Nov 24, 2011

LUGroupAssignNodes: Fix node membership corruption · 54c31fd3

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 218f4c3d)

54c31fd3

LUGroupAssignNodes: Fix node membership corruption · 218f4c3d

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

218f4c3d

Nov 08, 2011

Fail if node/group evacuation can't evacuate instances · d755483c

Michael Hanselmann authored 13 years ago


If an instance can't be evacuated, only a message would be printed. With
this change the operation always aborts. Newly added unittests check for
this behaviour.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d755483c

Oct 18, 2011

RAPI: Make node evacuation actually work · 0b58db81

Michael Hanselmann authored 13 years ago

Commit e1f23243 changed te LU and opcode for node evacuation to receive
a “mode” parameter (among other things). Commit de40437a changed the
RAPI code accordingly, but did so for an earlier version of the first
patch. Obviously this couldn't work, so here's the fix.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0b58db81

Oct 17, 2011

RAPI: Fix resource for replacing disks · 539d65ba

Michael Hanselmann authored 13 years ago


Commit d1c172de inadvertently changes the
“/2/instances/[instance_name]/replace-disks” resource to use body
parameters. There were no QA tests and the issue wasn't noticed.

This patch re-introduces support for query parameters and adds a QA
test.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

539d65ba

Oct 04, 2011

Fix issue when verifying cluster files · 170b02b7

Michael Hanselmann authored 13 years ago


If a cluster has any non-master-candidate nodes, those don't contain all
files (e.g. config.data). With commit aef59ae7 (March 31st, 2011)
the logic was changed and subsequently verifying a cluster with non-mc
nodes would complain.

This patch fixes this issue by changing the algorithm. It also adds an
additional check for files which shouldn't exist on a machine. A newly
added unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

170b02b7

Sep 30, 2011

LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383

Michael Hanselmann authored 13 years ago


When verifying a group the code would always check SSH to all nodes in
the same group, as well as the first node for every other group. On big
clusters this can cause issues since many nodes will try to connect to
the first node of another group at the same time. This patch changes the
algorithm to choose a different node every time.

A unittest for the selection algorithm is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

64c7b383

Aug 26, 2011

utils: Fix UnescapeAndSplit parsing bug · e4a48c7b

Michael Hanselmann authored 13 years ago


If a value passed to UnescapeAndSplit ended with a backslash an
exception would be raised:

$ gnt-instance modify -H mem=x\\ inst1.example.com
[…]
    e2 = slist.pop(0)
IndexError: pop from empty list

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e4a48c7b

Aug 23, 2011

Adding missing test data for commit · 8b046709

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8b046709

Fix a parsing issue with DRBD 8.3.11 in the Linux Kernel · 7a380ddf

René Nussbaumer authored 13 years ago


In the Linux kernel commit 4b0715f096 introduced a display bug into
/proc/drbd which broke our regex.

The bug was first introduced into Linux 2.6.39-rc1. This bug is still
unfixed as of today.

This patch adapt the regular expression to workaround this bug for the
time being.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7a380ddf

Aug 19, 2011

ensure-dirs: Check mode and owner before changing · c3f54085

Michael Hanselmann authored 13 years ago


This avoids many calls to chmod(2) and chown(2), and thereby ctime
updates.

Since I had to update the unittests anyway I untangled the code a bit,
split it into more separate functions and added some more tests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

c3f54085

ensure-dirs: Refine error handling on stat(2) · d00a730d

Michael Hanselmann authored 13 years ago


The “_stat_fn” function is renamed to “_lstat_fn” to reflect its
function. The try/except block just wraps calling lstat(2) and nothing
else.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

d00a730d

Aug 12, 2011

Clarify job ID-related type checks, add unittests · bdfd7802

Michael Hanselmann authored 13 years ago


Instead of a rather complicated expression only “JobId” is output. Job
ID lists (like generated by “SubmitManyJobs”) are limited to two-item
lists. Unittests are added.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

bdfd7802

Aug 11, 2011

Added helper functions in netutils and related constants · 37531236

Andrea Spadaccini authored 13 years ago


Added the following functions to netutils:
- IsValidInterface
- GetInterfaceIpAddresses
- _GetIpAddressesFromIpOutput

Added the following static methods to netutils.IPAddress:
- GetAddressFamilyFromVersion
- GetVersionFromAddressFamily

Added unit tests for the new methods in netutils.IPAddress, for the IP
address search regex and for GetInterfaceIpAddresses

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

37531236

Aug 09, 2011

rlib: Expose node group tags · f4e86448

Michael Hanselmann authored 13 years ago


Commit 1ffd2673 added support for tagging node groups. Also add a
check for exposed fields.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f4e86448

Aug 08, 2011

Detect globbing patterns as query arguments · f8638e28

Michael Hanselmann authored 13 years ago

Short: this patch enables the use of “gnt-instance list '*.site'”.

Detailed description: This patch changes the command line interface code
to try to deduce the kind of filter from the arguments to a “list”
command. If it's a list of plain names an old-style name filter is used.
If filtering is forced or the single argument is potentially a filter,
it is parsed as a query filter string. Any name looking like a globbing
pattern (e.g. “*.site” or “web?.example.com”) is treated as such.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

f8638e28

Aug 05, 2011

Implement globbing operator for filters · 16629d10

Michael Hanselmann authored 13 years ago


The operators “=*” and “!*” do globbing in filters, e.g.:

$ gnt-instance list --no-headers -o name 'name =* "*.site"'
inst1.site.example.com

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

16629d10

Bump version to 2.5.0~beta1 · 9f604ab8

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9f604ab8

cleaner: Remove watcher's instance status file after 21 days · 6890cf2e
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
6890cf2e

utils.ReadFile: Add pre-read callback · 0e5084ee

Michael Hanselmann authored 13 years ago


This will be used by the watcher to store the file's fstat(2). It must
be done from the filehandle.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0e5084ee

Aug 04, 2011

Fix unittest failure after list_owned changes · e8906f7d

Iustin Pop authored 13 years ago


We just need an object that has a list_owned method.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e8906f7d

ganeti-cleaner: Remove old watcher state files · 7b642c49

Michael Hanselmann authored 13 years ago


Watcher state files can stay around if node groups are removed. With
this patch they're removed after 21 days.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7b642c49

Aug 02, 2011

Add primary/second nodes' group as query fields · fab9573b

Michael Hanselmann authored 13 years ago


These will be very useful for ganeti-watcher as it needs to retrieve
instances by group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fab9573b

Jul 26, 2011

Add ht-based result checks to opcodes · 1ce03fb1

Michael Hanselmann authored 13 years ago


This adds the infrastructure necessary to check opcode results using
ht-based functions. Checks are added for two opcodes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1ce03fb1

Jul 25, 2011

Reopen log file only once after SIGHUP · ad88650c

Michael Hanselmann authored 13 years ago


Commit b6fa9a44 added a re-openable log handler. The log file is
reopened when a daemon is sent a HUP signal. Due to a bug in the code,
fixed by this patch, the log file would be reopened for every single log
message thereafter.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ad88650c

Jul 21, 2011

Implement instance failover via RAPI · c0a146a1

Michael Hanselmann authored 13 years ago


No idea why this was missed before.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c0a146a1

Export job dependencies through lock monitor · fcb21ad7

Michael Hanselmann authored 13 years ago


This makes them visible to the user. Example:

$ gnt-debug locks -o name,pending
Name    Pending
job/890 job:891,892
job/892 job:894

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fcb21ad7

Make lock monitor more versatile · 44b4eddc

Michael Hanselmann authored 13 years ago


With this change it'll be possible to register other lock information
providers. One usecase for this are job dependencies, which can be shown
in the output of “gnt-debug locks”, too.

The lock monitor is changed to accept more than one return value from
the function providing the information. Unfortunately it's hard to keep
weak references to bound methods, so that I settled on keeping a weak
reference on the object instead (see note in docstring).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

44b4eddc

Rename *_STATUS_WAITLOCK to …_WAITING · 47099cd1

Michael Hanselmann authored 13 years ago


This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to
{JOB,OP}_STATUS_WAITING, as per design document for chained jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

47099cd1

Fix locking issue with job dependencies · 75d81fc8

Michael Hanselmann authored 13 years ago

When jobs waiting for a dependency are notified, they're re-added to the
queue. This would require owning the queue lock in exclusive mode, but
since the function doing so is called from within the job/opcode
processor, it only holds the lock in shared mode.

This patch changes the result of the processor from a boolean to a
status value (integer). This way the caller can be notified about
actions to take, including notifying waiting jobs. The function adding
jobs to the queue can now acquire the lock in exclusive mode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

75d81fc8

jqueue: Implement submitting multiple jobs with dependencies · b247c6fc

Michael Hanselmann authored 13 years ago


With this change users of the “SubmitManyJobs” interface can use
relative job dependencies. Relative job IDs in dependencies are resolved
before handing the job off to the workerpool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b247c6fc

Jul 20, 2011

jqueue: Add “writable” flag to memory objects · c0f6d0d8

Michael Hanselmann authored 13 years ago


Basically only one instance of the job, the one being processed,
should be serialized to disk and replicated to other nodes. With
this flag assertions can be added in various places.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c0f6d0d8

Implement chained jobs · b95479a5

Michael Hanselmann authored 14 years ago


An overview is available in the design document for this change,
doc/design-chained-jobs.rst.

When a job enters the job processor, the current opcode's dependencies
are evaluated. If a referenced job has not yet reached the desired
status, the current job is registered as a dependant. The job processor
will continue to work on other pending tasks. When a job finishes it
notifies any pending dependants by re-adding them to the workerpool.

A per-job processor lock is necessary for rare cases where the same job
can be re-added twice.

There is no way to view waiting jobs at the moment, but I plan to
export this information to “gnt-debug locks”.

A so-called dependency manager takes care of managing waiting jobs and
keeping track of their status.

Unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b95479a5

Jul 15, 2011

Adding a wrapper around connecting to kvm console · 2f4c951e

Stephen Shirley authored 13 years ago


The wrapper will connect to the console, and check in the background if
the instance is paused, unpausing it as necessary.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2f4c951e

Jul 12, 2011

cli.GetOnlineNodes: Support node group filter, use query2 · 05484a24

Michael Hanselmann authored 13 years ago


This patc changes cli.GetOnlineNodes to use query2, which does the
filtering in the master daemon, and adds a new parameter to filter by
node group.

Unittests were added for the old implementation and then adopted to
ensure no functionality was lost.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

05484a24