- Feb 18, 2010
-
-
Michael Hanselmann authored
* stable-2.1: Fix ssh host key checking with no-key-check
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This function could be useful in other places and this way we can easily unittest it. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
On fork, the tempfile module's pseudo random generator is not reset. If several processes (e.g. two children or parent and child) try to create a temporary file, they'll conflict. This function can be used to reset the name generator which contains the pseudo random generator. A unittest is included. It is in a separate script because it changes a variable in the tempfile module to speed up the test. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
In case we add a node with “--no-ssh-key-check”, this should override any default yes/ask values in the system-wide (or user) ssh key check. Currently this only works in batch mode, whereas in non-batch we only override a 'no'. The patch fixes SshRunner such that in non-batch mode we enforce the value of StrictHostKeyChecking in all cases. Bug found and initial investigation by Theo Van Dinter. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 17, 2010
-
-
Iustin Pop authored
This should have been done in the _ExpandNodeName patch. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
There's no such thing as OpProgrammerError (I found this as I wrote it in code in another place, and pylint complained). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
snap_disks can contain boolean values. They weren't handled correctly. The error message was “Error while executing backend function: Invalid object passed to FromDict: expected dict, got <type 'bool'>”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Currently we have lots of duplication of the error-checking (and proper exception raising) around node/instance name expansion. LUCreateInstance is the only place where we have abstracted this. This patch creates two functions (ExpandNodeName and ExpandInstanceName) that will either raise the proper exception or return the expanded name. This allows a lot of cleanup of duplicate code. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 15, 2010
-
-
Michael Hanselmann authored
* origin/stable-2.1: Fix bug introduced in commit 413b7472 Fix locking bug causing high CPU usage Fix confd procotol design description Implement instance rename QA tests Fix "gnt-instance rename" functionality Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This patch extends commit 7ea7bcf6 by releasing all node locks in disk replace for the early release mode. The rationale behind this is: - LUCreateInstance already releases all node locks while waiting for disk synchronization, and does an instance startup later - WaitForSync only runs (for disk template 'drbd') 'lvs' and read /proc/drbd on the primary node, which should be (modulo bugs in LVM) safe for parallel run In any case, the worst I could foresee is a node having N lvs commands run in parallel on it, while being a primary for disk storage. Based on create instance doing this safely, and the fact that burnin with more than two instances per node is safe, I think this can be applied. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
These are both cleanups and, in the case of _MassageProcData, switching from a weaker RE to a stronger one (we now need cs: in the line, previosuly any line starting with \d+: was accepted). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
In case the old node is offline, we won't be able to talk to it to remove the storage, and in most cases the node is powered off/unreachable. In this case, it makes no sense to delay the storage release, so we enable automatically early_release mode, gaining parallelism during node evacuation. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 11, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This reverts commit 83d9f436. man is still unable to wrap some long lines, so we simply revert this patch (and filter out the specific message in autotools/check-man). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This should fix issue 68: some hooks should be run on more nodes than currently. GrowDisk runs on both nodes, remove run the post hook on the instance's nodes, and failover and migrate run the post hook on the source node too. Thanks to Maxence for the initial investigation and patch. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Per issue 71, the migrate and failover need special variables for keeping the nodes consistent during instance migrations. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
A long PREFIX variable (to configure) will result in very long LOCALSTATEDIR, which when concatenated with lib/ganeti/ (and even more items under it) will go over the 80 char line length we enforce in the man checker. To workaround this, we change two things: - use a specific REPLACE_VARS_MAN which adds breaking points after each slash in paths - replace some <filename> entries with <literallayout> so that docbook generates a non-fill block around them (only a few cases need this after the breaking points are added Note that with normal prefixes (e.g. / or /usr/local) this won't happen. The patch also fixes a wording in the watcher man page. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
There are two entry points to job execution in burnin, ExecOp and ExecOrQueue, and these are modified to call the new _SetDebug method on the opcodes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch changes SubmitOpCode and SubmitOrSend such that we have a single function that does generic CLI options to opcode attributes function. This will allow, once all scripts pass the opts argument to SubmitOpCode, to pass the debug parameter or the dry-run one to the LUs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This changes from boolean to integer/count (for a future differentiation based on the actual debug level). All the uses of the code only test it's boolean status, so it still works as an integer value. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Also automatically fix opcodes which have this missing in the LU init routine. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 10, 2010
-
-
Michael Hanselmann authored
While commit 413b7472 fixed the issue of poll(2) returning too soon, it didn't work when the poll(2) call should've been blocking. This is now fixed and verified. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Commit 154b9580 changed (correctly) the __slots__ usage, but this broke dumpers/loaders since we relied directly on the own class __slots__ field. To compensate, we introduce a simple function for computing the slots across all parent classes (if any), and use this instead of __slots__ directly. Note: the _all_slots() function is duplicated between objects.py and opcodes.py, but the only other options is to introduce a lang.py for such very basic language items. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Iustin Pop noticed unusually high CPU usage with 2.1's master daemon, even with very simple opcodes like OP_TEST_DELAY. As it turns out, we inadvertently passed seconds as milliseconds to a call to poll(2). Due to the way the loop around the call works it didn't break competely, but caused higher CPU usage by the poll(2) call returning too early. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
One fix is necessary in gnt-cluster.sgml. Also adding “DELETE_ON_ERROR” target to remove output file if an error occurred while building it (in this case the manpage). This was reported by Iustin Pop in issue 87 and proposed check method taken from Lintian. http://code.google.com/p/ganeti/issues/detail?id=87 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
The protocol design for confd was missing a description of the fourcc code which we use to distinguish between different message types, if we want to completely change the protocol. Adding them so that someone implementing it can find out. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 09, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch adds an early_release parameter in the OpReplaceDisks and OpEvacuateNode opcodes, allowing earlier release of storage and more importantly of internal Ganeti locks. The behaviour of the early release is that any locks and storage on all secondary nodes are released early. This is valid for change secondary (where we remove the storage on the old secondary, and release the locks on the old and new secondary) and replace on secondary (where we remove the old storage and release the lock on the secondary node. Using this, on a three node setup: - instance1 on nodes A:B - instance2 on nodes C:B It is possible to run in parallel a replace-disks -s (on secondary) for instances 1 and 2. Replace on primary will remove the storage, but not the locks, as we use the primary node later in the LU to check consistency. It is debatable whether to also remove the locks on the primary node, and thus making replace-disks keep zero locks during the sync. While this would allow greatly enhanced parallelism, let's first see how removal of secondary locks works. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Commit 91e0748c (Unify the “--no-ip-check” option) broke the options variable name for ‘--no-ip-check’ but since we don't have a QA test for instance rename (only burnin test), this was not caught until Issue 86 was opened. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Feb 08, 2010
-
-
Iustin Pop authored
* stable-2.1: TLReplaceDisks: Delay iallocator when evacuating node Implement debug level across OS-related RPC calls Second try to fix LUVerifyCluster LUVerifyCluster: Fix bug with offline nodes utils: Fix retry delay calculator Bump RPC protocol version to 30 Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
When evacuating nodes, the iallocator was run for all instances without taking planned changes into consideration. This patch delays part of CheckPrereq and running the iallocator for node evacuation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 03, 2010
-
-
Iustin Pop authored
This doesn't implement the full functionality, we need to add the debug level to the opcodes too, but at least won't require changing the RPC calls during the 2.1 series. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
My previous patch, commit 785d142e, fixed the case where a node is marked offline. With this patch it'll also handle other failures correctly. * Hooks Results - ERROR: node node2.example.com: Communication failure in hooks execution: Connection failed (111: Connection refused) Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
[…] * Other Notes - NOTICE: 1 offline node(s) found. * Hooks Results Failure: command execution error: iteration over non-sequence Commit a0c9776a introduced an error simulation mode to LUVerifyCluster. Due to a small mistake, offline nodes weren't skipped when checking the results of verification hooks and iterating over None raises an “iteration over non-sequence” error. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-