Commits · 40a09ee128082ff0453dec0f60497187c9aeb2a7 · itminedu / snf-ganeti

Feb 26, 2010

Fix two potentially endless loops in http library · 40a09ee1

Michael Hanselmann authored 15 years ago


The first can be problematic if poll(2) returns POLLHUP|POLLERR on a
socket. Before it would be only be respected for SOCKOP_RECV, but since
they can also occur on other socket operations, esp. in combination with
OpenSSL, letting the socket functions handle POLLHUP|POLLERR seems to be
the right thing.

The second is a typo leading to an endless loop if the first line of an
HTTP connection is empty (simply "\r\n"). Instead of removing the empty
line, it would remove anything after it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

40a09ee1

Fix bug in LUQueryConfigValues · cac599f1

Michael Hanselmann authored 15 years ago


LUQueryConfigValues supports multiple output fields. If the client asked
for the watcher pause status, it would not get a list, but simply the
value.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cac599f1

Feb 25, 2010

Fix typo in LUVerifyCluster when checking node time · 30bb62ea

Michael Hanselmann authored 15 years ago


The first argument to _ErrorIf should always be True in this case.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

30bb62ea

Feb 18, 2010

Fix ssh host key checking with no-key-check · e66d9f1a

Iustin Pop authored 15 years ago

In case we add a node with “--no-ssh-key-check”, this should override
any default yes/ask values in the system-wide (or user) ssh key check.

Currently this only works in batch mode, whereas in non-batch we only
override a 'no'. The patch fixes SshRunner such that in non-batch mode
we enforce the value of StrictHostKeyChecking in all cases.

Bug found and initial investigation by Theo Van Dinter.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e66d9f1a

Feb 10, 2010

Fix bug introduced in commit · b44b0141

Michael Hanselmann authored 15 years ago


While commit 413b7472 fixed the issue of poll(2) returning too
soon, it didn't work when the poll(2) call should've been
blocking. This is now fixed and verified.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b44b0141

Fix locking bug causing high CPU usage · 413b7472

Michael Hanselmann authored 15 years ago


Iustin Pop noticed unusually high CPU usage with 2.1's master
daemon, even with very simple opcodes like OP_TEST_DELAY. As
it turns out, we inadvertently passed seconds as milliseconds
to a call to poll(2). Due to the way the loop around the call
works it didn't break competely, but caused higher CPU usage
by the poll(2) call returning too early.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

413b7472

Feb 08, 2010

TLReplaceDisks: Delay iallocator when evacuating node · 94a1b377

Michael Hanselmann authored 15 years ago


When evacuating nodes, the iallocator was run for all
instances without taking planned changes into consideration.
This patch delays part of CheckPrereq and running the
iallocator for node evacuation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

94a1b377

Feb 03, 2010

Implement debug level across OS-related RPC calls · 4a0e011f

Iustin Pop authored 15 years ago


This doesn't implement the full functionality, we need to add the debug
level to the opcodes too, but at least won't require changing the RPC
calls during the 2.1 series.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4a0e011f

Second try to fix LUVerifyCluster · dd9e9f9c

Michael Hanselmann authored 15 years ago


My previous patch, commit 785d142e, fixed the case where a node is marked
offline. With this patch it'll also handle other failures correctly.

 * Hooks Results
   - ERROR: node node2.example.com: Communication failure in hooks
   execution: Connection failed (111: Connection refused)

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dd9e9f9c

LUVerifyCluster: Fix bug with offline nodes · 785d142e

Michael Hanselmann authored 15 years ago


[…]
 * Other Notes
   - NOTICE: 1 offline node(s) found.
 * Hooks Results
Failure: command execution error:
iteration over non-sequence

Commit a0c9776a introduced an error simulation mode to LUVerifyCluster.
Due to a small mistake, offline nodes weren't skipped when checking the
results of verification hooks and iterating over None raises an
“iteration over non-sequence” error.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

785d142e

utils: Fix retry delay calculator · 26751075

Michael Hanselmann authored 15 years ago


Before this patch, it would always sleep for at least
the time specified as the upper limit. Now it actually
limits the sleep time.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

26751075

Feb 01, 2010

Bump RPC protocol version to 30 · 72456eb2

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

72456eb2

Jan 29, 2010

Fix missing bridge for xen instances · 0183a697

Alessandro Cincaglini authored 15 years ago


Xen instances nic definitions miss the target bridge.

This bug was introduced in commit 503b97a9.

Signed-off-by: Alessandro Cincaglini <alessandro.ciancaglini@gmail.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>

0183a697

Jan 28, 2010

Fix flipping MC flag bug · cea0534a

Guido Trotter authored 15 years ago


Currently unofflining or undraining an already functional master
candidate node, can cause it to demote itself. In order to avoid that we
only trigger the self-promotion check if the node is not currently a
candidate.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cea0534a

Jan 22, 2010

KVM: fix pylint warning · e4dd2299

Guido Trotter authored 15 years ago


Specify string format arguments as logging function parameters

Signed-off-by: Guido Trotter <ultrotter@google.com>

e4dd2299

KVM: be more resilient on broken migration answers · c4e388a5

Guido Trotter authored 15 years ago


Before, when doing kvm live migrations we use to accept an "unknown
status" but to reject anything that didn't match our regexp. Since we've
seen "info migrate" return a completely empty answer, we'll be more
tolerant of completely unknown results (while still logging them) and at
the same time we'll limit the number of them which we're willing to
accept in a row.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c4e388a5

Jan 20, 2010

cli: Fix bug when not using headers · ec39d63c

Michael Hanselmann authored 15 years ago


Commit 9fe72672 added code to not write spaces at the end of each line.
Unfortunately it didn't work properly when not printing headers—there would
still be spaces.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ec39d63c

Jan 14, 2010

confd client: copy the peers in UpdatePeerList · db169865

Guido Trotter authored 15 years ago


Since the peer list is shuffled by the client, we don't keep a reference
to the list which was passed in, but copy it internally.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

db169865

Jan 13, 2010

Generate hmac file with a newline at the end · e2e92ea0

Guido Trotter authored 15 years ago


This makes it slightly easier to cut&paste its content.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e2e92ea0

jqueue: Don't return negative number for unchecked jobs when archiving · d2c8afb1

Michael Hanselmann authored 15 years ago


When the queue was empty, the calculation for unchecked jobs while
archiving would return -1. ``last_touched`` is set to 0, the job ID list
(``all_job_ids``) is empty. Calculating ``len(all_job_ids) -
last_touched - 1`` resulted in -1.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d2c8afb1

cli.GenerateTable: Don't write EOL spaces · 9fe72672

Michael Hanselmann authored 15 years ago


With this change, there won't be unnecessary space characters
at the end of lines.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9fe72672

Improve logging for workerpool tasks by providing __repr__ · 9fa2e150

Michael Hanselmann authored 15 years ago


Before it would log something like “starting task
(<ganeti.http.client._HttpClientPendingRequest object at 0x2aaaad176790>,)”,
which isn't really useful for debugging. Now it'll log “[…]
<ganeti.http.client._HttpClientPendingRequest
req=<ganeti.http.client.HttpClientRequest 172.24.x.y:1811 PUT /node_info at
0x2aaaaab7ed10> at 0x2aaaaab823d0>”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9fa2e150

workerpool: Simplify log messages · 02fc74da

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

02fc74da

workerpool: Use worker name as thread name · d16e6fd9

Michael Hanselmann authored 15 years ago


This way it shows up in debug logs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d16e6fd9

workerpool: Make worker ID alphanumeric · 89e2b4d2

Michael Hanselmann authored 15 years ago


Having a proper name instead of just a number makes debugging
easier.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

89e2b4d2

locking: Fix race condition in LockSet · 71e1863e

Michael Hanselmann authored 15 years ago


This patch fixes a race condition when acquiring all locks in
a LockSet instance. The list of lock names needs to be sorted
to guarantee a consistent locking order, but the names were not
sorted when acquiring all locks in the set.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

71e1863e

mcpu: Log lock status with sorted names · 4776e022

Michael Hanselmann authored 15 years ago


Reading and comparing sorted lists is easier when debugging locking problems.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4776e022

locking: Append to list outside error handling block · 9b154270

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

9b154270

locking: Don't fail in error handling if lock isn't owned · 56452af7

Michael Hanselmann authored 15 years ago

In case an exception was thrown while acquiring the lock, not necessarily all
owned locks are also really acquired. Before this change, an exception could be
masked by another exception thrown here. There is no good clean-up strategy
when acquiring a lock fails with an exception in either case.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

56452af7

Jan 11, 2010

Normalize MAC addresses to all lower. · 82187135

René Nussbaumer authored 15 years ago


This change will normalize the MAC to all lower after validation.

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

82187135

Jan 05, 2010

Introduce a Luxi call for GetTags · 7699c3af

Iustin Pop authored 15 years ago


This changes from submitting jobs to get the tags (in cli scripts) to
queries, which (since the tags query is a cheap one) should be much
faster.

The tags queries are already done without locks (in the generic query
paths for instances/nodes/cluster), so this shouldn't break tags query
via gnt-* list-tags.

On a small cluster, the runtime of gnt-cluster/gnt-instance list tags
more than halves; on a big cluster (with many MCs) I expect it to be
more than 5 times faster. The speed of the tags get is not the main
gain, it is eliminating a job when a simple query is enough.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7699c3af

LURenameCluster: run post hook on all nodes · 47a72f18

Iustin Pop authored 15 years ago


Since the cluster name might be used for various purposes on nodes, we
should let all nodes "know" about a cluster rename by running the post
hook on all nodes. This will make cluster rename slightly
slower/costlier, but it is not/shouldn't be an operation that is run
very often.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

47a72f18

Jan 04, 2010

Fix unused imports or add silences where needed · 30e4e741

Iustin Pop authored 15 years ago


In some cases pylint doesn't parse the import correctly, so we add
silences; but there are also many cases of unused imports, which we
simply remove.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

30e4e741

bdev: Add a TODO and a pylint silence · 527a15ac

Iustin Pop authored 15 years ago


A piece of old code in bdev.py uses a for loop over a single variable
because we can 'break' out of the loop or exit on the 'else' path. This
is not a nice usage of the for loop, it should be converted to a
standard if...elif...else structure.

In the meantime we silence a warning from pylint (it is actually
invalid, IMHO) and add a TODO.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

527a15ac

Further pylint disables, mostly for Unused args · 2d54e29c

Iustin Pop authored 15 years ago


Many of our functions have to follow a given API, and thus we have to
keep a given signature, but pylint doesn't understand this. Therefore,
we silence this warning.

The patch does a few other cleanups.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

2d54e29c

LUDiagnoseOS._DiagnoseByOS: remove unused arg · 857121ad

Iustin Pop authored 15 years ago


The node_list argument to _DiagnoseByOS is not used, and is obsoleted by
the fact that the rlist argument already has the valid nodes as keys
(assuming RPC behaviour didn't change). Thus, we remove it and silence
the warning.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

857121ad

hv_xen/_GetConfigFileDiskData: remove unused arg · 7ed85ffe

Iustin Pop authored 15 years ago


The disk template is not needed, all that's used is the disk data. As
such, remove this parameter from the function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

7ed85ffe

jqueue/_CheckRpcResult: log the whole operation · 45e0d704

Iustin Pop authored 15 years ago


Currently only the rpc call, but not its description (which also shows
the argument) is logged. We change this to log failmsg too, and this
also silences a warning.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

45e0d704

Optparse extenders have to obey a given API · 8929d28c

Iustin Pop authored 15 years ago


So we just silence the warning.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

8929d28c

backend._OSOndiskAPIVersion: remove obsolete arg · c19f9810

Iustin Pop authored 15 years ago


The 'name' argument is not used anymore, probably since before 2.0.
Since this is an internal function, we can just remove it (from its
caller too).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

c19f9810