- Oct 22, 2009
-
-
Flavio Silvestrow authored
Signed-off-by:
Flavio Silvestrow <flaviops@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In backend.InstanceShutdown(), there is a race condition between checking that the instance exists and trying to shut it down which translates sometime in error messages like: Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed to force stop instance instance9: Failed to stop instance instance9: exited with exit code 1, Error: Domain 'instance9' does not exist. To fix this, we ignore any hypervisor StopInstance() errors if the instance doesn't exist anymore, since our purpose (to make the instance go away) is already accomplished. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit e4e9b806 introduced two problems in backend.InstanceShutdown(): - first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in one second, and while not breaking anything this creates unnecessary load for the hypervisor - second, a wrong test added to the while condition (“not tried_once”) means that we only sleep once for an instance, and after that we immediately kill it forcefully These two together means that any instance which is not lucky enough to finish in roughly 1-1.5 seconds (the time it takes to sleep and verify again the instance list) will have this happen: 2009-10-21 23:33:46,034: pid=16634 INFO Called for inst9 w. False/False 2009-10-21 23:33:47,440: pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing 2009-10-21 23:33:47,440: pid=16634 INFO Called for inst9 w. True/False The “Called…” are logs from the hypervisor shutdown function. This means of course that at restart time: [12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem. [12775866.644689] EXT3-fs: write access will be enabled during recovery. [12775868.533674] kjournald starting. Commit interval 5 seconds [12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs [12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted [12775868.551803] EXT3-fs: recovery complete. [12775868.586275] EXT3-fs: mounted filesystem with ordered data mode. This patch reverts the broken test and changes the sleep to a fixed duration of five seconds, since it makes no sense to check that often for shutdown (and after ~20 seconds we anyway reach a stable value of five seconds). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit 4ad45119 changed the KVM hypervisor to send multiple shutdown requests to the monitor, but it didn't change this for the Xen hypervisor. We simply remove the return on retry model, since we do want to send multiple shutdown signals for both Xen and KVM (even if the behaviour is not perfect, they should behave the same). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 21, 2009
-
-
Michael Hanselmann authored
Also add assertions to avoid missing attributes in the future. They won't be included in optimized bytecode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 20, 2009
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
This addresses issue 75. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
I forgot to bump the configure.ac version before tagging the 2.1.0~beta1 release. Since we cannot remove old tags (see “On Re-tagging” in git-tag(1)), we have to call this release 2.1.0~beta2. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This patch adds checks for /proc and /sys in cluster verify, since Ganeti relies on these special filesystems to be mounted. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 19, 2009
-
-
Michael Hanselmann authored
Commit d22b2999 broke the serializer unittests with certain versions of simplejson. This patch removes sort_keys again and implements a slightly more efficient way of detecting simplejson functionality. The serializer unittests no longer use a partially broken mock, but rather a function to convert all tuples to lists before comparing. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 16, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This also fixes a few typos. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 15, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This class can also be used by mcpu. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 13, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
By moving the main code of LockSet.acquire to its own function we reduce the code complexity a bit and clarify the exception handling. This also fixes a case where a lock acquire timeout wasn't handled correctly, leading to obscure error messages. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
gnt-* scripts were building wrong opcodes for commands which had the shutdown_timeout slot (due to missing testing after renaming). Fixing. Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout": it was set to "timeout". It would still work that way, but possibly be confusing. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This also clarifies the UUIDs NEWS entry. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch fixes the tag PUT/DELETE operations, and additionally changes the _Tags_* functions to take only positional and not keyword arguments (the defaults do not make any sense at all, and they are always called with all arguments). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
All the LUs that shut down the instance need to be able too pass the timeout parameter as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 12, 2009
-
-
Michael Hanselmann authored
With this patch all timeouts are pre-calculated. The interface of the _LockTimeoutStrategy class is also changed a bit; NextAttempt now returns a new instance. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Found using pylint and epydoc. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
The timeout is always between ~0.1 and ~10.0 seconds. A small variation of ±5% is added to prevent different jobs from fighting each other. After 10 attempts to acquire the locks with a timeout, a blocking acquire is made. Lock status reporting will be improved in a separate patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-