- Feb 28, 2008
-
-
Guido Trotter authored
A LockSet represents locking for a set of resources of the same type. A thread can acquire multiple resources at the same time, and release some or all of them, but cannot acquire more resources incrementally at different times without releasing all of them in between. Internally a LockSet uses a SharedLock for each resource to be able to grant both exclusive and shared acquisition. It also supports safe addition and removal of resources at runtime. Acquisitions are ordered alphabetically in order to grant them to be deadlock-free. A lot of assumptions about how the code interacts are made in order to grant both safety and speed; in order to document all of them the code features pretty lenghty comments. The test suit tries to catch most common interactions but cannot really tests tight race conditions, for which we still need to rely on human checking. This is the second basic building block for the Ganeti Lock Manager. Instance and Node locks will be put in LockSets to manage their acquisition and release. Reviewed-by: imsnah
-
Guido Trotter authored
Even if the target instance is down or we are not checking for IP conflicts changing an instance name to a new one which is already in the cluster is doomed to fail, because in a lot of places (among which figures the mind of most users/admins) instance names are assumed to be unique. Reviewed-by: imsnah
-
- Feb 27, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: ultrotter
-
- Feb 25, 2008
-
-
Manuel Franceschini authored
This patch replaces some hardcoded strings with their corresponding constant in `_GenerateDiskTemplate()`. Reviewed-by: iustinp
-
- Feb 22, 2008
-
-
Manuel Franceschini authored
-
Manuel Franceschini authored
-
Iustin Pop authored
This patch switches from the twisted usage for inter-node protocol to simple BaseHTTPServer/httplib. The patch has more deletions because we use no authentication, no encryption at all. As such, this is just for trunk, and only for testing. What it brings is the ability to use the rpc library from within multiple threads in parallel (or it should so). Since the changes are very few and non-intrusive, they can be reverted without impacting the rest of the code. This passes burnin. QA was not tested. Reviewed-by: imsnah
-
- Feb 19, 2008
-
-
Guido Trotter authored
This new operation lets a lock be cleanly deleted. The lock will be exclusively held before deletion, and after it pending and future acquires will raise an exception. Other SharedLock operations are modify to deal with delete() and to avoid code duplication. This patch also adds unit testing for the new function and its interaction with the other lock features. The helper threads are sligtly modified to handle and report the condition of a deleted lock. As a bonus a non-related unit test about not supporting non-blocking mode yet has been added as well. This feature will be used by the LockSet in order to support deadlock-free delete of resources. This in turn will be useful to gracefully handle the removal of instances and nodes from the cluster dealing with the fact that other operations may be pending on them. Reviewed-by: iustinp
-
- Feb 18, 2008
-
-
Guido Trotter authored
Use the actual class name rather than a spaced version of it. Reviewed-by: iustinp
-
- Feb 16, 2008
-
-
Guido Trotter authored
Due to an indentation error only the last instance queried got returned by LUQueryInstanceData. Moving the append() call inside the for cycle to fix this issue. This is a one-liner targeted at 1.2.3 Reviewed-by: iustinp
-
- Feb 15, 2008
-
-
Iustin Pop authored
QA suite which tests gnt-instance modify has uncovered another issue related to mac export. Reviewed-by: imsnah
-
- Feb 14, 2008
-
-
Iustin Pop authored
This tiny patch fixes the breakage that the previous patch about activation did by removing the Close() call after activation. The initial reason for that call was that if the device is already active and open, but we need it closed, we close it automatically. This however conflicts with the 2-step open in the case the instance is already open. It makes sense to remove the call since in the current Ganeti setup, just doing Close() is not enough to change the device from (e.g.) primary to secondary, as some devices (e.g. md) might need Shutdown not Close. It also gets rid of a Close() in the CreateBlockDevice function, due to the same reasoning (although in Create the child should not have a different status anyway). Reviewed-by: imsnah
-
Iustin Pop authored
This patch adds a new field available for selection in gnt-instance list names "status" which represents the combined value of "admin_state" and "oper_state". Since this is much easier to parse (e.g. gnt-instance list |grep ERROR), we also modify the default field list to use this instead of the admin/oper state fields. Reviewed-by: imsnah
-
- Feb 12, 2008
-
-
Guido Trotter authored
DRBD 8.2 uses a double integer field ad protocol version, rather than a single one. This patch fixes the ganeti parsing code, allowing both the old and the new version type. In order to do so the internal _GetVersion function is changed to return a dict, rather than a list, and the second protocol field is added, only if present, as proto2. This is a fix for issue 24. Reviewed-by: iustinp
-
- Feb 10, 2008
-
-
Iustin Pop authored
Reviewed-by: ultrotter
-
- Feb 08, 2008
-
-
Guido Trotter authored
Adding a locking.py file for the ganeti locking library. Its first component is the implementation of a non-recursive blocking shared lock complete with a testing library. Reviewed-by: imsnah, iustinp
-
- Feb 05, 2008
-
-
Iustin Pop authored
This can be used for testing purposes. Reviewed-by: ultrotter,imsnah
-
Iustin Pop authored
This patch is a first step in reducing the chance of causing DRBD activation failures when the primary node has not-perfect data. This issue is more seen with DRBD8, which has an 'outdate' state (in which it can get more often). But it can (and before this patch, usually will) happen with both 7 and 8 in the case the primary has data to sync. The error comes from the fact that, before this patch, we activate the primary DRBD device and immediately (i.e. as soon as we can run another shell command) we try to make it primary. This might fail - since the primary knows it has some data to catch up to - but we ignored this error condition. The failure was visible later, in either md failing to activate over a read-only storage or by instance failing to start. The patch has two parts: one affecting bdev.py, which changes failures in BlockDev.Open() from returning False to raising errors.BlockDeviceError; noone (except a generic method inside bdev.py) checked this return value and we logged it but the master didn't know about it; now all classes raise errors from Open if they have a failure. The other part, affecting cmdlib.py, changes the activation sequence from: - activate on primary node as primary and secondary as secondary, in whatever order a function returns the nodes to the following: - activate all drives as secondaries, on both the primary and the secondary nodes of the instance - after that, on the primary node, re-activate the device stack as primary This is in order to give the chance to DRBD to connect and make the handshake. As noted in the comments, this just increases the chances of a handshake/connect, not fixing entirely the problem. However, it is a good first step and it passes all tests of starting with stale (either full or partial) primaries, with both drbd 7 and 8, and also passes a burnin. Note that the patch might make the device activation a little bit slower, but it is a reasonable trade-off. Reviewed-by: imsnah
-
- Feb 04, 2008
-
-
Iustin Pop authored
Reviewed-by: imsnah
-
Iustin Pop authored
This patch completes the change introduced in r566 (trunk) and r568 (branch-1.2). Reviewed-by: imsnah
-
- Jan 31, 2008
-
-
Guido Trotter authored
Currently just the bridge and ip address are passed. Add an environment variable for the mac address. Reviewed-by: iustinp
-
Alexander Schreiber authored
Reviewed-by: imsnah
-
- Jan 30, 2008
-
-
Guido Trotter authored
gnt-backup export used to export the ip and mac of each nic, but not which bridge it was connected to. Adding this information. Reviewed-by: iustinp
-
- Jan 28, 2008
-
-
Iustin Pop authored
The gnt-node and gnt-instance list commands have a customizable list of output fields, but the list is not up to date (in the man page) and not easily understandable from the ‘--help’ output. This patch updates the man pages and adds the available fields and default fields in the ‘--help’ output, as part of the description. Example: Usage ===== gnt-node list Lists the nodes in the cluster. The available fields are (see the man page for details): name, pinst_cnt, pinst_list, sinst_cnt, sinst_list, pip, sip, dtotal, dfree, mtotal, mnode, mfree, bootid. The default field list is (in order): name, dtotal, dfree, mtotal, mnode, mfree, pinst_cnt, sinst_cnt. Reviewed-by: imsnah,ultrotter
-
Iustin Pop authored
Reviewed-by: ultrotter
-
Iustin Pop authored
The new QA tests for instance modify uncovered a bug in the modify initrd operation when setting the initrd to none. Reviewed-by: imsnah
-
- Jan 25, 2008
-
-
Guido Trotter authored
It was wrongly deleted when converting if a in dict.keys(): to if a in dict: Reviewed-by: imsnah
-
- Jan 21, 2008
-
-
Guido Trotter authored
Passing a new aliases dict to generic main we can easily support aliases for compatibility reasons or simply useability. Reviewed-by: iustinp
-
Iustin Pop authored
LVM code sometimes adds an extra separator at the end of the field list. Make the code strip it if exists. Reviewed-by: imsnah
-
- Jan 20, 2008
-
-
Iustin Pop authored
Currently, the function backend._GetVGInfo only checks for errors via the exit code of the 'vgs' command. However, there are other ways of failure so we need to also check for valid output before parsing. Furthermore, the checks on the exit code were reported via a 'raise LVMError', however this exception is not handled anywhere and so the remote caller will not get reasonable data. This patch does two main things: - change the calling protocol for this function to not raise an error, and instead return the same type of argument always (dict) with the requested keys but values changed into None; this allows in the parent rpc call node_info to have valid memory information but "error" value for disk space, if there's an error with disks - check the validity of the output so that in case we fail to parse it, we don't abort with a backtrace in the node daemon but instead return the default result value (containing errors), and log these cases in the node daemon log file We also bump the protocol version to 11. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch does two things: - checks that the result values from call_node_info are valid integer values and aborts otherwise - skips disk space computation for the DT_DISKLESS case The most important point of the patch is the verification of results from the rpc call, as it prepares for a patch that allows failures to be better reported from the remote node. Reviewed-by: ultrotter
-
Iustin Pop authored
The checking of a node's free memory (via rpc.call_node_info) is done in both start instance an failover. This patch abstracts this call, together with the appropriate error handling, into a separate function called _CheckNodeFreeMemory. The patch also has some related changes: - the check is done in prereq and not in exec for start instance - the redundant check in exec for failover has been removed Reviewed-by: ultrotter
-
Iustin Pop authored
The function backend.UploadFile still uses "/etc/hosts" directly instead of the existing constant; this patch fixes this. Reviewed-by: ultrotter
-
Iustin Pop authored
Currently the fake hypervisor has hardcoded ‘/var/run’ as a base directory for its store. This patch adds a constant RUN_DIR that is used for both the fake hypervisor and for BDEV_CACHE_DIR. Reviewed-by: ultrotter
-
- Jan 16, 2008
-
-
Iustin Pop authored
This is a merge from the 1.2 branch Reviewed-by: imsnah
-
Iustin Pop authored
This is a merge from the 1.2 branch Reviewed-by: imsnah
-
- Jan 14, 2008
-
-
Iustin Pop authored
This patch fixes two name typos and a style issue (which makes pylint complain). Reviewed-by: ultrotter
-
Guido Trotter authored
Some new paramenters of the CreateInstance opcode are optional (namely kernel_path, initrd_path and hvm_boot_order) but their absence makes the code crash. Fix this by initializing them to a default value if they're not present. Reviewed-by: iustinp
-
- Jan 11, 2008
-
-
Alexander Schreiber authored
This patch adds support for specifying and changing the boot device order for HVM instances. The boot device order specification is ignored for non HVM instances. Reviewed-by: iustinp
-