- Nov 03, 2009
-
-
Michael Hanselmann authored
Also replaces a hardcoded limit of 15 seconds with 1/4 of NET_RECONFIG_TIMEOUT. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
There are quite a few retry loops with timeouts in Ganeti's code. Duplicating code is not good, so this patch introduces a new function named “utils.Retry” to remedy this situation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 02, 2009
-
-
Iustin Pop authored
Currently the repair storage has two issues: - down instances are aborting the operation, even though they should be ignored (it's not technically possible to know their disk status unless we would activate their disks) - if the VG is so broken that disks cannot be activated via gnt-instance activate-disks or gnt-instance startup, it's not possible to repair the VG at all The patch makes the opcode skip down instances and also introduces an ``--ignore-consistency`` flag for forcing the execution of the LU. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This finishes the conversion of OpPrereqError creation to two-argument style. Any leftovers as one-argument are not breaking anything, just losing information about the errors. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch adds a new ecode argument to RpcResult.Raise(). This allows specifying the error code (for both OpExec and OpPrereq errors). Note that this patch also makes the OpExecError exceptions raised from _FindFaultInstanceDisks have the error code classification. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch introduces a two-argument style for OpPrereqError. Only the direct raise calls in cmdlib.py are converted, other users will follow. cli.py is modified to handle both two-argument style and the current format. RAPI doesn't need modification as the way we encode errors is already using a list for the error arguments, so RAPI users only need to start checking the list length and the second argument. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is only used in two places, in an error path that is no longer valid since Ganeti 2.0. We remove the try..except since we should not get it anymore (and if we do, then we should catch it in all config.Update cases) and we remove the exception class completely. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Exporting an instance not running or without activated disks will fail. This patch makes sure to activate disks before exporting an instance if it's in the ADMIN_down state. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch unifies the query fields in the storage framework for all types. Note that the information is still computed on-demand, so if e.g. the used disk space is not requested for the ‘file’ type, it won't be computed on nodes. Summary of changes: - improve the LVM storage type to support multiple lvm fields in the LIST_FIELDS declaration and constant (not-computed via lvm commands) fields - rename utils.GetFilesystemFreeSpace to utils.GetFilesystemStats returning tuple of (total, free) - add used and free as valid fields for lvm-vg (use being computed as vg_size-vg_free) - make allocatable accepted for all types (ones which are always allocatable always return True) - add a new list field ‘type’ that gives the current selected type; not much useful today (except for understanding what the default output is) but in the future might help if we want to list multiple types - add type, size and allocatable to the default output field list - update the man page with details on how, for file storage, size ≠ used + free for non-mountpoint cases Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 30, 2009
-
-
Michael Hanselmann authored
There was a race condition between starting the node daemon and sending requests to write the ssconf files. With this patch, the initialization waits up to ten seconds for the node daemon to become responsive. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 29, 2009
-
-
Michael Hanselmann authored
Before: $ gnt-instance failover -f inst1 … checking disk consistency between source and target … - WARNING: Can't find disk on node node21.example.com … shutting down instance on source node After: $ gnt-instance failover -f inst1 … not checking disk consistency as instance is not running … shutting down instance on source node Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
This new function supports two schemes for passwords: - Old-style cleartext passwords - Hashed passwords according to RFC2617 (H(A1)) Schemes are differentiated by their prefix, a concept also used in OpenLDAP. Cleartext passwords can no longer start with an opening brace ("{") unless they're prefixed with "{cleartext}" (case insensitive). Currently there's no documentation for rapi_users at all. It'll be in a consecutive patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 28, 2009
-
-
Iustin Pop authored
For the Nth time, re-fix shadowing of outer-scope variable :) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
pylint is your friend, since the compiler doesn't exist. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 27, 2009
-
-
Michael Hanselmann authored
This is particularily useful for “gnt-cluster redist-conf”, but also for all other cases where the configuration files are rewritten on other nodes. $ gnt-cluster redist-conf … Copy of file /var/lib/ganeti/config.data to node … failed: Error while executing backend function: [Errno 1] Operation not permitted … Error while uploading ssconf files to node …: Error while executing backend function: [Errno 1] Operation not permitted $ gnt-node modify --offline no --force node3.example.com … - WARNING: Not enough master candidates (desired 10, new value will be 4) … Copy of file /var/lib/ganeti/config.data to node node8.example.com failed: Error while executing backend function: [Errno 1] Operation not permitted Modified node node3.example.com - offline -> True - master_candidate -> auto-demotion due to offline Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Commit 2bb5c911 moved around and changed the _RunAllocator function in the DiskReplace → TaskLet conversion, but in the process it changed the relocate_from argument from a list of nodes to just the secondary node. This breaks the protocol and current iallocator scripts. This patch fixes that but also adds a local variable 'instance' since it's not nice to write self.instance so many times. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 26, 2009
-
-
Guido Trotter authored
In 95b487bb we changed InstanceIpToNodePrimaryIpQuery to be able to query multiple instances at once. We also need to be able to query ips belonging to a specific nic link, so what we do is: 1) Move the "query" argument to a dict, containing different fields 2) Explicit the "query for a single ip" or "query for a list" options. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
We were already half-doing it, but this completes the process. 1) We don't maintain a list of ips or an ip->instance map 2) We add a new link,ip->instance map (link->ips list we had) 3) We add the link parameter to GetInstanceByIp (making it GetInstanceByLinkIp) 4) We change the GetInstanceByIp caller to pass None as link (thus for now using only the default link) Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
GetDefaultNicParams returns the default nic parameters. GetDefaultNicLink returns the default nic link. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
It's used by some functions defined there. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This allows using an email address (as is) as part of a tag. The main problem that could arise is when parsing tags from a shell script, but (AFAIK) '@' is not a special character when used in values (happy to be corrected if not true). The patch also moves the re to be compiled at class init time, should use less resources; in my tests it is fine to use a compiled re from multiple threads. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 23, 2009
-
-
Michael Hanselmann authored
The “result” variable may not be set and/or come from the previous loop. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 22, 2009
-
-
Guido Trotter authored
The /32 suffix is useless, since the kernel already assumes single-host, if no suffix is specified. Moreover we prefer these routes to be "static" so that routing daemons, if present, won't mess with them. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
In routing mode we are tweaking a few parameters on the interface. With this patch we'll tweak both the v4 and v6 ones. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Until now we relied on traffic from instances being policy routed via a rule based on the instance network. With this change we can enforce it on the instance interfaces. Since the ip rules survive interface disappearing and reappearing, we need first to remove leftover rules, and then to apply the new one, when creating the interface. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Ken Wehr authored
Allows the initialization of a cluster without the creation or distribution of SSH key pairs. Includes changes for LeaveCluster and RPC. Signed-off-by:
Ken Wehr <ksw@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Flavio Silvestrow authored
Signed-off-by:
Flavio Silvestrow <flaviops@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In backend.InstanceShutdown(), there is a race condition between checking that the instance exists and trying to shut it down which translates sometime in error messages like: Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed to force stop instance instance9: Failed to stop instance instance9: exited with exit code 1, Error: Domain 'instance9' does not exist. To fix this, we ignore any hypervisor StopInstance() errors if the instance doesn't exist anymore, since our purpose (to make the instance go away) is already accomplished. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit e4e9b806 introduced two problems in backend.InstanceShutdown(): - first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in one second, and while not breaking anything this creates unnecessary load for the hypervisor - second, a wrong test added to the while condition (“not tried_once”) means that we only sleep once for an instance, and after that we immediately kill it forcefully These two together means that any instance which is not lucky enough to finish in roughly 1-1.5 seconds (the time it takes to sleep and verify again the instance list) will have this happen: 2009-10-21 23:33:46,034: pid=16634 INFO Called for inst9 w. False/False 2009-10-21 23:33:47,440: pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing 2009-10-21 23:33:47,440: pid=16634 INFO Called for inst9 w. True/False The “Called…” are logs from the hypervisor shutdown function. This means of course that at restart time: [12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem. [12775866.644689] EXT3-fs: write access will be enabled during recovery. [12775868.533674] kjournald starting. Commit interval 5 seconds [12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs [12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted [12775868.551803] EXT3-fs: recovery complete. [12775868.586275] EXT3-fs: mounted filesystem with ordered data mode. This patch reverts the broken test and changes the sleep to a fixed duration of five seconds, since it makes no sense to check that often for shutdown (and after ~20 seconds we anyway reach a stable value of five seconds). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit 4ad45119 changed the KVM hypervisor to send multiple shutdown requests to the monitor, but it didn't change this for the Xen hypervisor. We simply remove the return on retry model, since we do want to send multiple shutdown signals for both Xen and KVM (even if the behaviour is not perfect, they should behave the same). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 21, 2009
-
-
Michael Hanselmann authored
Also add assertions to avoid missing attributes in the future. They won't be included in optimized bytecode. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 20, 2009
-
-
Iustin Pop authored
This patch adds checks for /proc and /sys in cluster verify, since Ganeti relies on these special filesystems to be mounted. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 19, 2009
-
-
Michael Hanselmann authored
Commit d22b2999 broke the serializer unittests with certain versions of simplejson. This patch removes sort_keys again and implements a slightly more efficient way of detecting simplejson functionality. The serializer unittests no longer use a partially broken mock, but rather a function to convert all tuples to lists before comparing. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 16, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 15, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-