- Jan 29, 2009
-
-
Oleksiy Mishchenko authored
The resources we still need moved to rlib2. Reviewed-by: iustinp
-
Oleksiy Mishchenko authored
Reviewed-by: iustinp
-
Oleksiy Mishchenko authored
It is impossible to keep backward compatibility due to significant changes in the Ganeti core. Reviewed-by: iustinp
-
- Jan 28, 2009
-
-
Iustin Pop authored
Reviewed-by: ultrotter
-
Iustin Pop authored
This patch correctly marks the drives as read-only for Xen, and raises and exception for KVM since it doesn't support read-only drives. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch fixes two issues with the cancel mechanism: - cancelled jobs show as such, and not in error state (we mark them as OP_STATUS_CANCELED and not OP_STATUS_ERROR) - queued jobs which are cancelled don't raise errors in the master (we treat OP_STATUS_CANCELED now) Reviewed-by: imsnah
-
- Jan 27, 2009
-
-
Guido Trotter authored
Also raise HypervisorError rather than OpExecError. Reviewed-by: iustinp
-
Guido Trotter authored
Also raise HypervisorError rather than OpExecError. Reviewed-by: iustinp
-
Iustin Pop authored
This patch adds a simple check that the 'mode' attribute of top-level disks is correct. It does not recurse over children. The framework could be extended with other checks in the future. Reviewed-by: imsnah
-
Iustin Pop authored
Currently, only the LUSetInstanceParams correctly sets up the mode attribute via a manual operation. We remove this and instead do the correct setting in the generic _GenerateDiskTemplate function, so that we set the mode correctly for all disk creations. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch changes the multi-instance gnt-* commands (gnt-instance start/stop, gnt-node evacuate/failover) such that the individual operations are submitted in parallel, ideally improving the speed of the execution. The patch does this by abstracting the job set functionality into a new class in cli.py, that takes care of the job submit, job poll and error handling. Reviewed-by: ultrotter
-
Iustin Pop authored
This is a simply typo from the conversion to multi-job archiving. Reviewed-by: imsnah
-
Guido Trotter authored
This parameter allows a different path to be passed to the instance kernel. The new parameter is mandatory, and by default has the value of the old hardcoded value for both kvm and xen. Beta1 clusters will need to have this parameter added for their instances to be able to boot. Reviewed-by: iustinp
-
Guido Trotter authored
This is a class method, because it calls _InstanceSerial, which is another class method. The patch changes it to classmethod for all the hypervisor classes. Reviewed-by: iustinp
-
Guido Trotter authored
Those methods need nothing from the instantiated class, and just manipulate strings, and fetch some class global variables, so they can be classmethods. Reviewed-by: iustinp
-
- Jan 26, 2009
-
-
Iustin Pop authored
Even though alpha started at 0, we release beta 1 first as we did for 1.2. Reviewed-by: imsnah, ultrotter
-
Iustin Pop authored
Also import the NEWS entries from the 1.2 branch which were added since we created it. Reviewed-by: ultrotter
-
- Jan 23, 2009
-
-
Guido Trotter authored
A missing 'be' was present in the error string for both xen and kvm, when the kernel or initrd path was not absolute. Reviewed-by: imsnah
-
Iustin Pop authored
In case we submit multiple instances via batcher, it's nicer to have the sorted nicely. Reviewed-by: imsnah
-
Iustin Pop authored
This patch fixes the gnt-instance batch-create command, and in doing so also slightly changes two other functions: - we change utils.ParseUnit so that it accepts integer values also (both ParseUnit(5) and ParseUnit("5") return the same value) - a bridge 'None' in LUCreateInstance will be converted to the default bridge; currently only missing bridges will be accepted to mean the default one The main changes to batcher were the change to variable number of disks and NICs. The patch also adds a batcher-instances.json example file copied from the 1.2 branch and properly modified. Reviewed-by: imsnah, killerfoxi
-
Iustin Pop authored
This patch changes the iallocator framework to work with and properly export to plugins offline nodes. It does this by only exporting the static configuration data for those nodes, and not attempting to parse the runtime data. The patch also fixes bugs in iallocator related to the RpcResult conversion, changes the should_run to admin_up attribute name (as per the internals change), and adds “-I” as a short option for “--iallocator” in gnt-instance, gnt-backup and burnin. Reviewed-by: ultrotter
-
Iustin Pop authored
Currently the DRBD code checks that the metadata devices are valid before creation, initial disk attachment and add children. However, the process for checking validity requires a free DRBD minor, and this conflict with parallel checking. There are at least three possible solutions: - serialize all checks, which means we reduce parallelism and need extra locks - don't pass a valid minor number, but one like “/dev/drbd256” (which is invalid); this works for current version of DRBD, but since it's not guaranteed to remain so it doesn't look nice - don't do the checking at all, and rely on “drbdsetup ... disk ...” to fail by itself The reason for checking metadata was that in 1.2, this was much cheaper than trying to activate devices (and the subsequent iteration over the minors). However, in 2.0, they have the same cost, so we can choose option 3: just remove the explicit checking and rely on drbdsetup and the kernel to fail. Since DRBD8._InitMeta still requires a minor number, the two places where this is run are handled as follows: - Create: we just use our own (unused currently) minor number - AddChildren: we keep using FindUnusedMinor, with the caveat that this function (used by replace-disks -n ...) cannot be yet parallelized Reviewed-by: ultrotter
-
Iustin Pop authored
This patch changes (significantly) the execution model in burnin: - for all runs, (almost) all instance mods in a single Burn* procedure are done as part of a job; so for example add disk, stop, remove disk, start are no longer done as separate jobs but as a single job consisting of four opcodes - for parallel runs, all Burn* procedures except the rename (which uses a single target name) run in parallel; before, only the creation was done in parallel - due to the single-job execution and also parallel execution, the logging messages are no longer happening synchronously with the execution, so they are more informative than an actual execution log The end result is that burnin now tests properly multi-opcode jobs and also tests all opcodes (except rename) for parallel execution. Note: On a test cluster, parallelization reduces burnin time from 23m to 15m. Reviewed-by: ultrotter
-
Iustin Pop authored
Currently the restrictions are too harsh: there is a time interval between an instance gets a new disk and before it is added to the configuration in which the restriction is not met. We solve this by allowing temporary DRBD minors to match existing minors (for the same instance), such that parallel creations/minor allocations are OK. The change is done by moving the add of temporary minors to the minor map after the instance minors are computed, and only considering them as duplicate if the instance name doesn't match. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch enhances the duplicate DRBD minors checks (currently just a few) and adds automatic checks of configuration consistency at configuration file writing time. In order to do so and show meaningful error messages, the _UnlockedComputeDRBDMap function is changed to not raise errors in case of duplicates, but instead return both the minors map and the duplicate list, and its callers now raise the error. This allows the VerifyConfig function to return a complete list of duplicates. The new checks required some small updates to the unittests for the config module. Reviewed-by: ultrotter
-
Iustin Pop authored
When creating ‘fake’ results for offline nodes, we currently don't pass the call attribute. This complicates debugging, so even though this should not matter in practice, it's better to fix it. Reviewed-by: imsnah
-
Iustin Pop authored
This removes some constraints: - only two disks supported, this is no longer true as the underlying functions can now compute size for a variable number of disks - error when the hypervisor was not being passed - typo error Reviewed-by: imsnah
-
- Jan 22, 2009
-
-
Iustin Pop authored
This is less of an actual issue for regular gnt-* clients, but it's easily reproducible with burnin and possible with RAPI (depending on how the program uses luxi.Client(s)). In case of burnin, if we interrupt the client (^C) while it polls the job, it will abort and raise an error. After that, burnin issues a remove instance job, and at this point, we send the submit job (remove) call but the first thing we read from the socket will be the response to the previous poll job request, since that was queued already from the master. To solve this, whenever we detect an error in Transport.Call(), we close that transport and re-create a new one, to start anew. The other alternative would be to introduce a sequence to the protocol, but this is something that would be design-level change and it's not recommended at this stage. Reviewed-by: imsnah
-
- Jan 21, 2009
-
-
Guido Trotter authored
When an instance fails to shut down we currently log its whole object, rather than just the instance name. Reviewed-by: iustinp
-
Guido Trotter authored
If the KVM live migration ends up in a 'failed' state it has been aborted at the kvm level, and the machine is still running locally. We support also the 'cancelled' state even though there should be no way of reaching it, without manual intervention. Reviewed-by: iustinp
-
Guido Trotter authored
Reviewed-by: iustinp
-
Guido Trotter authored
The tcp port used for migrating KVM instances is selectable at ./configure time. We use a single port as nodes are locked anyway during a migration, so no two migrations can happen at the same time to the same node. Reviewed-by: iustinp
-
Guido Trotter authored
Throughout the kvm code we very often look for the instance pidfile name, read it, and check if the process is alive. Abstract this into a private function and use that one instead. This patch also changes RebootInstance to check whether the instance is alive before trying to reboot it. Reviewed-by: iustinp
-
Guido Trotter authored
RebootInstance was broken, because it just used to call StartInstance with wrong parameters. With this patch we still stop the instance, but use the saved kvm runtime to start it again. Reviewed-by: iustinp
-
Guido Trotter authored
When we ask the instance to shutdown sometimes the command won't work, especially if the instance isn't fully booted up. We'll wait for a bit, and give it a few chances before giving up. Reviewed-by: iustinp
-
Guido Trotter authored
These are used, for the xen hypervisor, to copy the xen config file to the remote node. This breaks migration for instances which have been migrated, but not restarted, with the old code, for which the config file was just lost. Reviewed-by: iustinp
-
Iustin Pop authored
This patch converts the DRBD minors reservation protocol from explicit release to automatic release on the success paths. On the errors paths, it's still needed to manual release. The patch doesn't bring much by itself, but is needed for a future patch which enhances the automatic verification of configuration consistency. Reviewed-by: ultrotter
-
Iustin Pop authored
Two are real errors (invalid names) and one is style error (overriding name from outer scope). Reviewed-by: ultrotter
-
Iustin Pop authored
This was forgotten in the recent “switch to explicit ignore rules”. Reviewed-by: imsnah
-
Iustin Pop authored
Currently the rpc module logs the error description and target node in rpc calls logging, as such: 2009-01-21 00:50:01,456: pid=1051/Thread-21 ERROR RPC error from node node1.example.com: Connection failed (111: Connection refused) but this doesn't help to understand which call caused this (here it's an offline node which should not be contacted at all). This patch adds the logging of the call too, so cases like the above can be debugged easier. Reviewed-by: imsnah, ultrotter
-