Commits · 3ddd94f919e69a30ab2c495070a7d01617a5e3a2 · itminedu / snf-ganeti

Jul 25, 2011

Reopen log file only once after SIGHUP · ad88650c

Michael Hanselmann authored 13 years ago


Commit b6fa9a44 added a re-openable log handler. The log file is
reopened when a daemon is sent a HUP signal. Due to a bug in the code,
fixed by this patch, the log file would be reopened for every single log
message thereafter.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ad88650c

Jul 21, 2011

Implement instance failover via RAPI · c0a146a1

Michael Hanselmann authored 13 years ago


No idea why this was missed before.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c0a146a1

Export job dependencies through lock monitor · fcb21ad7

Michael Hanselmann authored 13 years ago


This makes them visible to the user. Example:

$ gnt-debug locks -o name,pending
Name    Pending
job/890 job:891,892
job/892 job:894

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fcb21ad7

Make lock monitor more versatile · 44b4eddc

Michael Hanselmann authored 13 years ago


With this change it'll be possible to register other lock information
providers. One usecase for this are job dependencies, which can be shown
in the output of “gnt-debug locks”, too.

The lock monitor is changed to accept more than one return value from
the function providing the information. Unfortunately it's hard to keep
weak references to bound methods, so that I settled on keeping a weak
reference on the object instead (see note in docstring).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

44b4eddc

Rename *_STATUS_WAITLOCK to …_WAITING · 47099cd1

Michael Hanselmann authored 13 years ago


This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to
{JOB,OP}_STATUS_WAITING, as per design document for chained jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

47099cd1

Fix locking issue with job dependencies · 75d81fc8

Michael Hanselmann authored 13 years ago

When jobs waiting for a dependency are notified, they're re-added to the
queue. This would require owning the queue lock in exclusive mode, but
since the function doing so is called from within the job/opcode
processor, it only holds the lock in shared mode.

This patch changes the result of the processor from a boolean to a
status value (integer). This way the caller can be notified about
actions to take, including notifying waiting jobs. The function adding
jobs to the queue can now acquire the lock in exclusive mode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

75d81fc8

jqueue: Implement submitting multiple jobs with dependencies · b247c6fc

Michael Hanselmann authored 13 years ago


With this change users of the “SubmitManyJobs” interface can use
relative job dependencies. Relative job IDs in dependencies are resolved
before handing the job off to the workerpool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b247c6fc

Jul 20, 2011

jqueue: Add “writable” flag to memory objects · c0f6d0d8

Michael Hanselmann authored 13 years ago


Basically only one instance of the job, the one being processed,
should be serialized to disk and replicated to other nodes. With
this flag assertions can be added in various places.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c0f6d0d8

Implement chained jobs · b95479a5

Michael Hanselmann authored 14 years ago


An overview is available in the design document for this change,
doc/design-chained-jobs.rst.

When a job enters the job processor, the current opcode's dependencies
are evaluated. If a referenced job has not yet reached the desired
status, the current job is registered as a dependant. The job processor
will continue to work on other pending tasks. When a job finishes it
notifies any pending dependants by re-adding them to the workerpool.

A per-job processor lock is necessary for rare cases where the same job
can be re-added twice.

There is no way to view waiting jobs at the moment, but I plan to
export this information to “gnt-debug locks”.

A so-called dependency manager takes care of managing waiting jobs and
keeping track of their status.

Unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b95479a5

Jul 15, 2011

Adding a wrapper around connecting to kvm console · 2f4c951e

Stephen Shirley authored 13 years ago


The wrapper will connect to the console, and check in the background if
the instance is paused, unpausing it as necessary.

Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2f4c951e

Jul 12, 2011

cli.GetOnlineNodes: Support node group filter, use query2 · 05484a24

Michael Hanselmann authored 13 years ago


This patc changes cli.GetOnlineNodes to use query2, which does the
filtering in the master daemon, and adds a new parameter to filter by
node group.

Unittests were added for the old implementation and then adopted to
ensure no functionality was lost.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

05484a24

Jul 11, 2011

ht: Add new check for numbers · 697f49d5

Michael Hanselmann authored 13 years ago


Places which receive floats can usually also deal with integers, e.g.
OpTestDelay. Tests are added and the new check function is used for the
aforementioned opcode and verifying query results.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

697f49d5

Jul 05, 2011

Change RAPI for new node evacuation opcode · de40437a

Michael Hanselmann authored 14 years ago


The change is not backwards compatible, see the updated NEWS file.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

de40437a

Jun 10, 2011

Adding basic abstraction layer for caching · 13699e58

René Nussbaumer authored 14 years ago


This includes an own simple cache implementation and an
interface to a memcache instance.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

13699e58

Jun 09, 2011

Fix _checkRsaPrivateKey for newer key generation · ecc760ce

Guido Trotter authored 14 years ago


Keys generated under debian sid just read "BEGIN PRIVATE KEY" rather
than "BEGIN RSA PRIVATE KEY".

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ecc760ce

Jun 03, 2011

Fix incomplete merge · 1e6d5750

Iustin Pop authored 14 years ago


Commit 66bd7445 changed the semantics of _JobProcessor on finished
jobs, and updated the related unittests in the 2.4 branch. It was then
merged to master, however on master there was an additional test for
this case, which was not updated.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1e6d5750

May 31, 2011

jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445

Michael Hanselmann authored 14 years ago


When a job was cancelled, its status would be changed and the file
written again. Since this was a final status, the job file could be
moved anytime for archival. If the job was still in the queue, however,
it would be processed (not fully, just updating the “end_timestamp”
attribute) and written again. This was bad as it could leave the same
job in two different files.

With this patch the processor is changed to return early for finished
jobs. Cancelling a queued job will finalize it right away. Unittests are
updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

66bd7445

May 30, 2011

gnt-node migrate: Use LU-generated jobs · b7a1c816

Michael Hanselmann authored 14 years ago


Until now LUNodeMigrate used multiple tasklets to evacuate all primary
instances on a node. In some cases it would acquire all node locks,
which isn't good on big clusters. With upcoming improvements to the LUs
for instance failover and migration, switching to separate jobs looks
like a better option. This patch changes LUNodeMigrate to use
LU-generated jobs.

While working on this patch, I identified a race condition in
LUNodeMigrate.ExpandNames. A node's instances were retrieved without a
lock and no verification was done.

For RAPI, a new feature string is added and can be used to detect
clusters which support more parameters for node migration. The client
is updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>

b7a1c816

May 27, 2011

ht: Add checks for anything, regexp, job ID, container items · 8620f50e

Michael Hanselmann authored 14 years ago


The check for container items is useful for tuples and/or lists with
non-uniform values. The “anything” check can be used when any value
should be accepted for an item.

The job ID check, which uses the regexp check, will be used for
expressing opcode dependencies on other jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8620f50e

May 24, 2011
- GetEntResolver: Make it possible to resolve uid/gid to name · 44fbd23b
  René Nussbaumer authored 14 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  44fbd23b
- utils.algo: Add InvertDict to invert a dict · 0a9a0e5a
  René Nussbaumer authored 14 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  0a9a0e5a
May 20, 2011

Improve hooks documentation unittest · 83a2da0f

Michael Hanselmann authored 14 years ago


Also check for the opcode ID.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

83a2da0f

Split LUClusterVerify into LUClusterVerify{Config,Group} · bf93ae69

Adeodato Simo authored 14 years ago

With this change, LUClusterVerifyConfig becomes a "light" LU that only
verifies the global config and other, master-only settings, and the bulk of
node/instance verification is done by LUClusterVerifyGroup, which only acts
on nodes and instances of a given group.

To ensure that `gnt-cluster verify` continues to operate on the whole
cluster, the client creates an OpClusterVerifyGroup job per node group; for
convenience, the list of node groups is returned by LUClusterVerifyConfig.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bf93ae69

May 17, 2011

ht: Add strict check for dictionaries · a464ce71

Michael Hanselmann authored 14 years ago


This allows checking specific dictionary items, unlike TDict
or TDictOf.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a464ce71

May 13, 2011

SharedLock: Implement downgrade from exclusive to shared mode · 3dbe3ddf

Michael Hanselmann authored 14 years ago


If a job needs to modify a resource and then wait for a result, it must
acquire the resource lock in exclusive mode. In some cases it would be
possible to only have a shared lock for waiting. Until now it was not
possible to change a lock's mode once it'd been acquired. Releasing and
re-acquiring might have been possible, but would require many more
checks and can introduce new issues.

With this patch a new method, named “downgrade”, is added to Ganeti's
own SharedLock class. It can only be called when the lock is held in
exclusive mode and changes it to shared. If there are any pending shared
acquires on the same priority, they're moved to the front of the queue
and notified (jumping ahead of exclusive acquires).

In a lockset the internal lock will be downgraded if, and only if, all
individual locks owned by the current thread are either released or
acquired in shared mode.

Unittests are provided.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

3dbe3ddf

May 10, 2011

Re-indent test/mocks.py using two spaces · 2d91e6ae

Michael Hanselmann authored 14 years ago


No idea where those four spaces came from, but they must've been there
for a while.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2d91e6ae

cmdlib: Remove acquired_locks attribute from LUs · 0d5a0b96

Michael Hanselmann authored 14 years ago

The “acquired_locks” attribute in LUs is used to keep a list of acquired
locks at each lock level. This information is already known in the lock
manager, which also happens to be the authoritative source. Removing the
attribute and directly talking to the lock manager saves us from having
to maintain the duplicate information when releasing locks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0d5a0b96

May 09, 2011

opcodes: Add function for compact summary · 3ce9a5e7

Michael Hanselmann authored 14 years ago

Depending on the opcode and its parameters, the existing “Summary”
function can give a rater long summary. For displaying the summary in
logs and in the lock monitor, it should be shorter. Hence this new
function is added to just use the opcode ID with common prefixes
replaced (e.g. “INSTANCE_” becomes “I_”). Opcode values are not used.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3ce9a5e7

locking: Make parameter to condition's wait() positional · 83f2d5f6

Michael Hanselmann authored 14 years ago


It is always used in the locking code. Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

83f2d5f6

Apr 28, 2011

SetEtcHostsEntry: maintain existing ordering · 8f9a87c5

Iustin Pop authored 14 years ago


Currently RemoveEtcHostsEntry keeps the ordering, but SetEtcHostsEntry
not, as it will always write the new entry at the end of file. I
personally dislike this as it "uglifies" my custom host files, so this
patch makes it update the record instead in-place so to say instead of
moving it.

The patch also simplifies the construction of the new line (we were
doing duplicate work for no gain).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8f9a87c5

Fix WriteFile with unicode data · 1d39e245

Iustin Pop authored 14 years ago


Unicode is fun, indeed:

>>> len(buffer("abc"))
3
>>> len(buffer(u"abc"))
12

So we can't pass unicode data to buffer(), as the result will be to
write the in-memory (usually UTF-32) representation to disk.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1d39e245

Apr 21, 2011

RAPI: Add support for tagging node groups · 414ebaf1

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

414ebaf1

Apr 18, 2011

qlang: Add function to distinguish filters from names · 3f2f55bb
Michael Hanselmann authored 14 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
3f2f55bb

qlang: Add parser for query filter language · 7578ab0a

Michael Hanselmann authored 14 years ago


With this parser, command line utilities will be able to provide filters
through query2 in a simplistic language. Example filters:

  name == "node3.example.com"
  master or (name == "node4.example.com")
  be/memory == 128 and name =~ /^web/i
  "inst1.example.com" in sinst_list
  status != "up"
  not master

Parts of the syntax came from Python, others from Perl. Documentation
will be added in follow-up patches.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7578ab0a

Add instance query field for OS parameters · 7c670076

Michael Hanselmann authored 14 years ago


These were not available as a query field before. Update unittests
and description text for the other “..params” fields.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7c670076

Apr 13, 2011

utils.WriteFile: Close file before renaming · a9d68e40

Michael Hanselmann authored 14 years ago

Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154

)
reported an “Operation not supported” error when writing instance
exports to a mounted CIFS filesystem. Experimentation showed the error
to only occur when using rename(2) on an opened file. Various references
on the web confirmed this observation. Whether or not the problem occurs
can also depend on the CIFS server implementation. In issue 154 it was
Windows 2008 R2.

While not solving all cases, closing the file before renaming helps
alleviating the issue a bit. Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a9d68e40

Apr 06, 2011

Increase the lock timeouts before we block-acquire · d385a174

Iustin Pop authored 14 years ago


This has been observed to cause problems on real clusters via the
following mechanism:

- a long job (e.g. a replace-disks) is keeping an exclusive lock on an
  instance
- the watcher starts and submits its query instances opcode which
  wants shared locks for all instances
- after about an hour, the watcher job falls back to blocking acquire,
  after having acquired all other locks
- any instance opcode that wants an exclusive lock for an instance
  cannot start until the watcher has finished, even though there's no
  actual operation on that instance

In order to alleviate this problem, we simply increase the max timeout
until lock acquires are sent back to either blocking acquire or
priority increase. The timeout is computed such that we wait ~10 hours
(instead of one) for this to happen, which should be within the
maximum lifetime of a reasonable opcode on a healthy cluster. The
timeout also means that priority increases will happen every half hour.

We also increase the max wait interval to 15 seconds, otherwise we'd
have too many retries with the increased interval.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d385a174

utils: Add function generating regex for DNS name globbing · bbfed756

Michael Hanselmann authored 14 years ago


The intent of this function is to be able to provide a globbing operator
or query filters. One should be able to say, for example, something to
the effect of “gnt-instance shutdown '*.site'”.

Also rename a variable in MatchNameComponent.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bbfed756

Apr 05, 2011

query: Add implementation of regex match operator · 23d0a608

Michael Hanselmann authored 14 years ago


So far this operator was not implemented. This patch adds an additional
value preparation function to the function table for binary operators,
used to compile the regular expression. Unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

23d0a608

Mar 31, 2011

Add a simple wrapper over utils.Retry · c7d3a832

Iustin Pop authored 14 years ago


The new wrapper makes moving legacy code to utils.Retry or adding
retries in existing code simpler.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c7d3a832