- Nov 02, 2009
-
-
Iustin Pop authored
Currently the repair storage has two issues: - down instances are aborting the operation, even though they should be ignored (it's not technically possible to know their disk status unless we would activate their disks) - if the VG is so broken that disks cannot be activated via gnt-instance activate-disks or gnt-instance startup, it's not possible to repair the VG at all The patch makes the opcode skip down instances and also introduces an ``--ignore-consistency`` flag for forcing the execution of the LU. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 13, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
All the LUs that shut down the instance need to be able too pass the timeout parameter as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 09, 2009
-
-
Guido Trotter authored
Using the new --timeout option: - gnt-instance shutdown is changed to accept a timeout - the opcode is changed to hold one - the LU is changed to optionally get one - the rpc is changed to carry one - the backend is changed to take it as a parameter rather than hardcoding it in the function Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 05, 2009
-
-
Guido Trotter authored
These two opcode need to know whether an unknown variant must be forced through or not. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Sep 17, 2009
-
-
Iustin Pop authored
One of the issues we have in ganeti is that it's very hard to test the error-handling paths; QA and burnin only test the OK code-path, since it's hard to simulate errors. LUVerifyCluster is special amongst the LUs in the fact that a) it has a lot of error paths and b) the error paths only log the error, they don't do any rollback or other similar actions. Thus, it's enough for this LU to separate the testing of the error condition from the logging of the error condition. This patch does this by replacing code blocks of the form: if x: log_error() [y] into: log_error_if(x) [if x: y ] After this change, it's simple enough to turn on logging of all errors by adding a special case inside log_error_if such that if the incoming opcode has a special ‘debug_simulate_errors’ attribute and it's true, it will log unconditionally the error. Surprisingly this also turns into an absolute code reduction, since some of the if blocks were simplified. The only downside to this patch is that the various _VerifyX() functions are now stateful (modifying an attribute on the LU instance) instead of returning a boolean result. Last note: yes, this discovered some error cases in the logging. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the output of cluster verify can be parsed for 'ERROR' messages, but that is the only indication we get (error or no error). In order to allow monitoring tools to separate different error conditions, this patch introduces a new output format (“gnt-cluster verify --error-codes”) that changes the output from human-friendly to machine-friendly. In this mode, an error line changes from: ERROR: node node1: drbd minor 1 of instance inst1.is not active to: ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active i.e. the error message is a ‘:’-separated field, with ERROR in the first place, the error code in the second, the object type (cluster, node, instance) in the third, the name of the object (for nodes/instances) in the fourth, and then the text message. The patch also removes some of the verbosity of the operation (“Verifying instance X”, “Verifying node X”) since on big clusters these informational messages can quickly fill up an entire screen. The original behaviour can be restored via the ‘--verbose’ option. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 24, 2009
-
-
Iustin Pop authored
This patch adds a basic version of LUMoveInstance. It doesn't yet support iallocator-mode and it's implemented in old-style (non-TL) mode. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 17, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 14, 2009
-
-
Iustin Pop authored
This can be used for a 'plain' type instance when the underlying storage went away, to recreate the storage (and reinstall) instead of removing the instance and readding it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 10, 2009
-
-
Luca Bigliardi authored
Add an 'empty' logical unit to run hooks after cluster initialization. Signed-off-by:
Luca Bigliardi <shammash@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 04, 2009
-
-
Iustin Pop authored
This patch adds a new opcode and lu for checking disk sizes. Currently it does only top-level disk verification, and also doesn't check primary/secondary node size mismatches (these two are added as TODOs in the Exec() function of the LU). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modified OpActivateDisks, LUActivateDisks and gnt-instance activate-disks to support and pass this option to _AssembleInstanceDisks. The patch is quite trivial I think; there should be no issues from it except if used when not needed. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Aug 03, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 31, 2009
-
-
Michael Hanselmann authored
It migrates all primary instances from the node to their secondaries. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 22, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 17, 2009
-
-
Iustin Pop authored
This patch converts the opcode loading to a pre-built map (at import time) instead of iteration over the globals dict at each call. Microbenchmarks show that this should be around three times faster, and burnin still passes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 19, 2009
-
-
Iustin Pop authored
This patch adds a new (global) opcode flag 'dry_run' which, when True, causes early exit from the LU workflow, returning a special value from the LU object (initialized in the parent LogicalUnit class, and which if not overriden from child LUs will be None). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This simple patch adds to all opcodes extension of the base opcode __slots__. This way we can add slots across all opcodes, for example 'dry-run'. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 08, 2009
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 27, 2009
-
-
Iustin Pop authored
This (somewhat big) patch adds support for remotely rebooting the nodes via whatever support the hypervisor has for such a concept. For KVM/fake (and containers in the future) this just uses sysrq plus a ‘reboot’ call if the sysrq method failed. For Xen, it first tries the above, and then Xen-hypervisor reboot (we first try sysrq since that just requires opening a file handle, whereas xen reboot means launching an external utility). The user interface is: # gnt-node powercycle node5 Are you sure you want to hard powercycle node node5? y/[n]/?: y Reboot scheduled in 5 seconds The node reboots hopefully after sending the reply. In case the clock is broken, “time.sleep(5)” might take ages (but then I suspect SSL negotiation wouldn't work). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 19, 2009
-
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Feb 24, 2009
-
-
Iustin Pop authored
This patch removes the extra_args parameter and instead switches the instance to the HV_KERNEL_ARGS hypervisor option. This is a big change, but it's a needed cleanup, this extra parameter on all RPC calls is not generic and we also need to have a persistent value here. Reviewed-by: imsnah
-
- Feb 10, 2009
-
-
Iustin Pop authored
This patch adds LU and cli-level support for modification of the node drained flag. It is similar to the offline changes. Reviewed-by: imsnah
-
- Feb 06, 2009
-
-
Iustin Pop authored
This patch fixes a couple of issues with the job listing: - in case of a non-existing job, nicely raise 404 instead of 500 - in the job detail listing, also list the job log, the job timestamps, etc. - the opcode migrate instance was missing its description field Reviewed-by: imsnah
-
- Feb 04, 2009
-
-
Iustin Pop authored
This patch adds the framework for, and enables lockless OpQueryInstances. This means that instances will be shown in ERROR_up or ERROR_down state, even though this is not an error (but just an in-progress job). The framework is implemented as follows: - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take an additional “use_locking” flag which will denote whether to lock or not; this patch only implements this for LUQueryInstances - the luxi query functions take an additional argument use_locking which is passed to the master daemon, and then passed to the above opcodes - cli.py export a new SYNC_OPT command line options which implement setting this flag to true - except for gnt-instance list, which uses this option, and for name-only queries (e.g. QueryNodes(fields=["names"])), all other callers are setting this flag to True - RAPI also sets the flag to True The patch was tested with a continuous (0.2s sleep in-between) gnt-instance list during a burnin, and no problems were observed. Reviewed-by: ultrotter
-
- Jan 20, 2009
-
-
Iustin Pop authored
Reviewed-by: ultrotter
-
- Jan 13, 2009
-
-
Iustin Pop authored
This is forward port via copy (and not individual patches cherry-pick) of the latest code on the 1.2 branch related to the migration. The changes compared to 1.2 are the fact that we don't need the IdentifyDisks step anymore (the drbd rpc calls are independent now), and the rpc module improvements. Reviewed-by: ultrotter
-
- Jan 12, 2009
-
-
Iustin Pop authored
This LU can be used to force a push of the config in case it's needed, for example after an upgrade to update the ssconf_release_version file. Reviewed-by: imsnah
-
- Dec 08, 2008
-
-
Iustin Pop authored
This patch changes gnt-node modify and the associated opcode/lu to allow modification of the node offline attribute. Setting a node into offline mode automatically demotes it from the master role. Reviewed-by: ultrotter
-
- Dec 02, 2008
-
-
Iustin Pop authored
This patch adds a new cluster paramater "candidate_pool_size" which tracks the desired size of the list of nodes with the master_candidate flag set. Reviewed-by: imsnah
-
Iustin Pop authored
This patch adds the OpCode, LogicalUnit and gnt-node command for modifying node parameters, more specifically the master candidate flag for a node. Reviewed-by: imsnah
-
- Nov 25, 2008
-
-
Iustin Pop authored
This big patch adds support for: - changing NIC/disks in the multi-device model - adding/removing NICs - adding/removing disks The patch is big and not very nice; the error checking paths are not very clear. The biggest problem is that from a simple instance.ATTR=VAL change (which didn't throw errors before) now we are creating and removing disks in this LU. Reviewed-by: imsnah
-
- Nov 24, 2008
-
-
Guido Trotter authored
Since the hypervisor is instance dependent we'll get one on instance creation, and use the one in the instance config on relocation. Reviewed-by: iustinp
-
- Nov 20, 2008
-
-
Iustin Pop authored
This patch adds support for mult-disk/multi-nic in: - instance add - burnin The start/stop/failover/cluster verify work as expected. Replace disk and grow disk are TODO. There's also a change gnt-job to allow dictionaries to be listed in gnt-job info. Reviewed-by: imsnah
-
- Oct 16, 2008
-
-
Iustin Pop authored
This patch enables the cluster modify to change: - enabled hypervisor list - hvparams (per hypervisor) - beparams (only the default group) Syntax: gnt-cluster modify -B vcpus=3 -H xen-pvm:no_initrd_path Validation for parameters is somewhat missing - the individual hypervisors will be checked for syntax and validation, but beparams doesn't have validation yes (nowhere), it should be added here once we have a global method (will come soon). Reviewed-by: imsnah
-
- Oct 14, 2008
-
-
Iustin Pop authored
The patch adds a new ‘--no-wait-for-sync’ parameter to grow-disk similar to the one in instance add, and changes the default to wait. This is cleaner as at the moment when the command returns, we either have a fully synced disk or there is an error. This is a forward-port of rev 1183 on the 1.2 branch. Reviewed-by: ultrotter
-
Iustin Pop authored
This big patch changes the master code to use the beparams. Errors might have crept in, but it passes a small burnin. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch adds a new '-s' parameter to ‘gnt-instance info’ that makes it return only 'static' information. This is much faster, especially for drbd instances. This is a forward-port of rev 1570 on the ganeti-1.2 branch, resending due to some conflicts. Reviewed-by: imsnah
-