- Oct 22, 2009
-
-
Guido Trotter authored
In routing mode we are tweaking a few parameters on the interface. With this patch we'll tweak both the v4 and v6 ones. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Until now we relied on traffic from instances being policy routed via a rule based on the instance network. With this change we can enforce it on the instance interfaces. Since the ip rules survive interface disappearing and reappearing, we need first to remove leftover rules, and then to apply the new one, when creating the interface. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Ken Wehr authored
Allows the initialization of a cluster without the creation or distribution of SSH key pairs. Includes changes for LeaveCluster and RPC. Signed-off-by:
Ken Wehr <ksw@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Flavio Silvestrow authored
Signed-off-by:
Flavio Silvestrow <flaviops@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In backend.InstanceShutdown(), there is a race condition between checking that the instance exists and trying to shut it down which translates sometime in error messages like: Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed to force stop instance instance9: Failed to stop instance instance9: exited with exit code 1, Error: Domain 'instance9' does not exist. To fix this, we ignore any hypervisor StopInstance() errors if the instance doesn't exist anymore, since our purpose (to make the instance go away) is already accomplished. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit e4e9b806 introduced two problems in backend.InstanceShutdown(): - first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in one second, and while not breaking anything this creates unnecessary load for the hypervisor - second, a wrong test added to the while condition (“not tried_once”) means that we only sleep once for an instance, and after that we immediately kill it forcefully These two together means that any instance which is not lucky enough to finish in roughly 1-1.5 seconds (the time it takes to sleep and verify again the instance list) will have this happen: 2009-10-21 23:33:46,034: pid=16634 INFO Called for inst9 w. False/False 2009-10-21 23:33:47,440: pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing 2009-10-21 23:33:47,440: pid=16634 INFO Called for inst9 w. True/False The “Called…” are logs from the hypervisor shutdown function. This means of course that at restart time: [12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem. [12775866.644689] EXT3-fs: write access will be enabled during recovery. [12775868.533674] kjournald starting. Commit interval 5 seconds [12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs [12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted [12775868.551803] EXT3-fs: recovery complete. [12775868.586275] EXT3-fs: mounted filesystem with ordered data mode. This patch reverts the broken test and changes the sleep to a fixed duration of five seconds, since it makes no sense to check that often for shutdown (and after ~20 seconds we anyway reach a stable value of five seconds). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit 4ad45119 changed the KVM hypervisor to send multiple shutdown requests to the monitor, but it didn't change this for the Xen hypervisor. We simply remove the return on retry model, since we do want to send multiple shutdown signals for both Xen and KVM (even if the behaviour is not perfect, they should behave the same). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 21, 2009
-
-
Michael Hanselmann authored
Also add assertions to avoid missing attributes in the future. They won't be included in optimized bytecode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 20, 2009
-
-
Iustin Pop authored
This patch adds checks for /proc and /sys in cluster verify, since Ganeti relies on these special filesystems to be mounted. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 19, 2009
-
-
Michael Hanselmann authored
Commit d22b2999 broke the serializer unittests with certain versions of simplejson. This patch removes sort_keys again and implements a slightly more efficient way of detecting simplejson functionality. The serializer unittests no longer use a partially broken mock, but rather a function to convert all tuples to lists before comparing. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 16, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 15, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This class can also be used by mcpu. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 13, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
By moving the main code of LockSet.acquire to its own function we reduce the code complexity a bit and clarify the exception handling. This also fixes a case where a lock acquire timeout wasn't handled correctly, leading to obscure error messages. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
gnt-* scripts were building wrong opcodes for commands which had the shutdown_timeout slot (due to missing testing after renaming). Fixing. Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout": it was set to "timeout". It would still work that way, but possibly be confusing. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch fixes the tag PUT/DELETE operations, and additionally changes the _Tags_* functions to take only positional and not keyword arguments (the defaults do not make any sense at all, and they are always called with all arguments). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
All the LUs that shut down the instance need to be able too pass the timeout parameter as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 12, 2009
-
-
Michael Hanselmann authored
With this patch all timeouts are pre-calculated. The interface of the _LockTimeoutStrategy class is also changed a bit; NextAttempt now returns a new instance. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Found using pylint and epydoc. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
The timeout is always between ~0.1 and ~10.0 seconds. A small variation of ±5% is added to prevent different jobs from fighting each other. After 10 attempts to acquire the locks with a timeout, a blocking acquire is made. Lock status reporting will be improved in a separate patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
The timeout passed to LockSet.acquire() is measured over all lock acquires. If LockSet.acquire fails to acquire all requested locks within the specified amount of time, all locks are released again and the acquire fails. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 09, 2009
-
-
Guido Trotter authored
Using the new --timeout option: - gnt-instance shutdown is changed to accept a timeout - the opcode is changed to hold one - the LU is changed to optionally get one - the rpc is changed to carry one - the backend is changed to take it as a parameter rather than hardcoding it in the function Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Currently it has lots for duplicated code, and internal retries. Clean it up with the following assumptions: We'll probably be called more than once. It is ok to fail to stop, unless we're called with force=True. If we're called only once, and with force=True it's ok not to run the chroot "cleanup" script (it's a destroy after all, why should chroots have more chances than other instances?). Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
Since we know StopInstance is going to be called more than once (at least twice, once with force and once without, but normally quite a lot more) we don't need our own sleep/loop, and we can just send one monitor command per call. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
1) unhardcode the timeout, abstracting it in a constant 2) Use time.time() rather than hiding the timeout in a range() 3) call hyper.StopInstance multiple times -- currently all hypervisors just ignore all calls but once 4) Use hyper.ListInstances() rather than GetInstanceList([hv_name]) -- it's cheaper :) 5) Change the final message to "forcing" from "using destroy" Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
It reflects the "current" two minutes we give to the instance. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Currently some hypervisors need the stop operations to be retried more than once, while other ones only do it in one pass. With this change we'll handle retries outside the hypervisor code, but telling whether this is the first try or not. Since this option is not used for now, all hypervisors just return if called with retry set to on, maintaining the old behavior. Since the fake hypervisor has an idempotent StopInstance call, we avoid returning in that case. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
- We never remember to use it (5 uses vs 21 " ,".join()) - It's longer to write than " ,".join() - The added value of the apostrophe in the string is not very much Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-