- Feb 03, 2010
-
-
Iustin Pop authored
This doesn't implement the full functionality, we need to add the debug level to the opcodes too, but at least won't require changing the RPC calls during the 2.1 series. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 01, 2010
-
-
Iustin Pop authored
… instead of disk size, which is not as reliable. This actually simplifies the code; but it still leaves the possibility of stack overflows if the disk data structure is corrupted. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 25, 2010
-
-
Iustin Pop authored
int()/float() can raise either ValueError (in case of int("a")), or TypeError (in case of int(None)). We had many bugs over time due to this, and a recent one was just diagnosed, so we go over the codebase and replace all 'except ValueError' with 'except (TypeError, ValueError)' that protect such conversions (there were no 'except TypeError' cases that needed a ValueError added). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jan 04, 2010
-
-
Iustin Pop authored
The 'name' argument is not used anymore, probably since before 2.0. Since this is an internal function, we can just remove it (from its caller too). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Many methods are simple pure functions, and not depending on the object state. We convert these to staticmethods. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This patch should have only: - pylint disables - docstring changes - whitespace changes Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Note there are some cases left which need extra cleanup. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Dec 28, 2009
-
-
Iustin Pop authored
This patch adds targeted pylint disables, where it makes sense (either due to limitations in pylint or due to historical usage), and also a few blanket ones in rapi where all the names are… “different”. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Dec 14, 2009
-
-
Iustin Pop authored
This patch unifies the search for external script to always go through utils.FindFile and implements in that function a restriction on valid chars in file names and (additionally) that the passed name is the basename of the final (absolute) name. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This will allow reuse of the same mask for multiple validations. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 30, 2009
-
-
Michael Hanselmann authored
The warning will be generated if the clocks diverge by more than 150 seconds. Due to the way the RPC system works, we cannot get exact time differences, e.g. if one of the queried nodes is broken. The comparision is done using a time window. Confd queries will fail if the clock on the client and server are more than 300 seconds from each other. This check helps keeping at least the nodes of a cluster in sync. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 25, 2009
-
-
Iustin Pop authored
This patch removes the quotes from CommaJoin and converts most of the callers (that I could find) to it. Since CommaJoin does str(i) for i in param, we can remove these, thus simplifying slightly a few calls. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 11, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 06, 2009
-
-
Iustin Pop authored
This patch adds some silences and tweaks the code slightly so that “pylint --rcfile pylintrc -e ganeti” doesn't give any errors. The biggest change is in jqueue.py, the move of _RequireOpenQueue out of the JobQueue class. Since that is actually a function and not a method (never used as such) this makes sense, and also silences two pylint errors. Another real code change is in utils.py, where FieldSet.Matches will return None instead of False for failure; this still works with the way this class/method is used, and makes more sense (it resembles more closely the re.match return values). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 05, 2009
-
-
Michael Hanselmann authored
Until now, Ganeti started and stopped its own daemons using custom functions. To start, the daemon was just executed and then sent the appropriate signals to stop it again. Init scripts would have to pay attention to the PID file and other things. With this patch, a new script is added (“daemon-util”, installed in $prefix/lib/ganeti/), centralizing the starting and stopping of daemons. The provided example init script is adjusted to use this new script. Ganeti's code no longer calls its own init script. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 04, 2009
-
-
Iustin Pop authored
Currently the $hypervisor.MigrateInstance takes the instance name. This patch changes it to take the instance object, such that other instance properties (especially hvparams) are available to it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 03, 2009
-
-
Iustin Pop authored
This patch is an attempt to fix the ugly issue during migration: Cannot resync disks on node …: [True, 100] If my understanding is correct, sometimes we poll the /proc/drbd file at an inoportune moment, while it's being updated, or while the DRBD device is changing state, and we see an unexpected state. Based on the assumption that this is just a transient state, rather than aborting directly, we change the backend.DrbdWaitSync() function to retry a few times the operation, giving DRBD a chance to settle down at the end of the resync. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
A newer version of pylint, more warnings… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Since ':' is not a valid character in PV names (for the way Ganeti uses LVM), we need to check this and warn the user. This patch adds a new NV_PVLIST cluster verify check and verifies the PV names returned from the nodes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 02, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 22, 2009
-
-
Ken Wehr authored
Allows the initialization of a cluster without the creation or distribution of SSH key pairs. Includes changes for LeaveCluster and RPC. Signed-off-by:
Ken Wehr <ksw@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In backend.InstanceShutdown(), there is a race condition between checking that the instance exists and trying to shut it down which translates sometime in error messages like: Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed to force stop instance instance9: Failed to stop instance instance9: exited with exit code 1, Error: Domain 'instance9' does not exist. To fix this, we ignore any hypervisor StopInstance() errors if the instance doesn't exist anymore, since our purpose (to make the instance go away) is already accomplished. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit e4e9b806 introduced two problems in backend.InstanceShutdown(): - first, it reduced the check interval significantly (especially for the first few checks); there are very few production VMs that shutdown in one second, and while not breaking anything this creates unnecessary load for the hypervisor - second, a wrong test added to the while condition (“not tried_once”) means that we only sleep once for an instance, and after that we immediately kill it forcefully These two together means that any instance which is not lucky enough to finish in roughly 1-1.5 seconds (the time it takes to sleep and verify again the instance list) will have this happen: 2009-10-21 23:33:46,034: pid=16634 INFO Called for inst9 w. False/False 2009-10-21 23:33:47,440: pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing 2009-10-21 23:33:47,440: pid=16634 INFO Called for inst9 w. True/False The “Called…” are logs from the hypervisor shutdown function. This means of course that at restart time: [12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem. [12775866.644689] EXT3-fs: write access will be enabled during recovery. [12775868.533674] kjournald starting. Commit interval 5 seconds [12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs [12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted [12775868.551803] EXT3-fs: recovery complete. [12775868.586275] EXT3-fs: mounted filesystem with ordered data mode. This patch reverts the broken test and changes the sleep to a fixed duration of five seconds, since it makes no sense to check that often for shutdown (and after ~20 seconds we anyway reach a stable value of five seconds). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 20, 2009
-
-
Iustin Pop authored
This patch adds checks for /proc and /sys in cluster verify, since Ganeti relies on these special filesystems to be mounted. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 13, 2009
-
-
Guido Trotter authored
All the LUs that shut down the instance need to be able too pass the timeout parameter as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 12, 2009
-
-
Michael Hanselmann authored
Found using pylint and epydoc. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 09, 2009
-
-
Guido Trotter authored
Using the new --timeout option: - gnt-instance shutdown is changed to accept a timeout - the opcode is changed to hold one - the LU is changed to optionally get one - the rpc is changed to carry one - the backend is changed to take it as a parameter rather than hardcoding it in the function Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
1) unhardcode the timeout, abstracting it in a constant 2) Use time.time() rather than hiding the timeout in a range() 3) call hyper.StopInstance multiple times -- currently all hypervisors just ignore all calls but once 4) Use hyper.ListInstances() rather than GetInstanceList([hv_name]) -- it's cheaper :) 5) Change the final message to "forcing" from "using destroy" Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Oct 05, 2009
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
When we load an OS from disk, we need _TryOSFromDisk to get the real name, without any variant. This allows any functionality that uses the instance OS to handle a name with a variant. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
According to the design on api_version >= 15 the OS variant is the part of the OS name after the "+" sign. If none is found, we just pass in the first variant an OS declares (which is bound to exist, as we check for it in _TryOSFromDisk). Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
Adding the file name to the os_files dict will fill in the full path and get it checked, if present we also read it and split into lines, one per declared variant. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
Currently all checked files in the loop are os scripts, so nothing will change, but in the future we only want the +x bit on actual os scripts, not necessarily all files. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
We'll be using this dict/loop to check more than just scripts, so we're renaming the variables appropriately. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Sep 25, 2009
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 14, 2009
-
-
Iustin Pop authored
Currently, “gnt-cluster verify” and “gnt-cluster verify-disks” use the list of LVs as returned by backend.GetVolumeList to determine whether an LV exists or not. However, LVs can also be ‘virtual’, which is handled correctly (i.e. as missing) by the bdev code, but not by this function. This patch changed GetVolumeList to simply skip virtual LVs; this makes cluster verify and verify-disks report these correctly as missing. The only downside is that an user could get confused (lvs reports the volume as existing, but ganeti as missing). However, this is better than simply considering virtual LVs as “good”. No other code beside these two gnt-cluster operations uses the GetVolumeList function, so we don't change the behaviour of the rest of the code (e.g. replace-disks, instance info, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 03, 2009
-
-
Michael Hanselmann authored
This survived QA, burnin and unittests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Luca Bigliardi <shammash@google.com>
-