- 09 Oct, 2009 1 commit
-
-
Guido Trotter authored
Using the new --timeout option: - gnt-instance shutdown is changed to accept a timeout - the opcode is changed to hold one - the LU is changed to optionally get one - the rpc is changed to carry one - the backend is changed to take it as a parameter rather than hardcoding it in the function Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 05 Oct, 2009 1 commit
-
-
Guido Trotter authored
These two opcode need to know whether an unknown variant must be forced through or not. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- 17 Sep, 2009 2 commits
-
-
Iustin Pop authored
One of the issues we have in ganeti is that it's very hard to test the error-handling paths; QA and burnin only test the OK code-path, since it's hard to simulate errors. LUVerifyCluster is special amongst the LUs in the fact that a) it has a lot of error paths and b) the error paths only log the error, they don't do any rollback or other similar actions. Thus, it's enough for this LU to separate the testing of the error condition from the logging of the error condition. This patch does this by replacing code blocks of the form: if x: log_error() [y] into: log_error_if(x) [if x: y ] After this change, it's simple enough to turn on logging of all errors by adding a special case inside log_error_if such that if the incoming opcode has a special ‘debug_simulate_errors’ attribute and it's true, it will log unconditionally the error. Surprisingly this also turns into an absolute code reduction, since some of the if blocks were simplified. The only downside to this patch is that the various _VerifyX() functions are now stateful (modifying an attribute on the LU instance) instead of returning a boolean result. Last note: yes, this discovered some error cases in the logging. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the output of cluster verify can be parsed for 'ERROR' messages, but that is the only indication we get (error or no error). In order to allow monitoring tools to separate different error conditions, this patch introduces a new output format (“gnt-cluster verify --error-codes”) that changes the output from human-friendly to machine-friendly. In this mode, an error line changes from: ERROR: node node1: drbd minor 1 of instance inst1.is not active to: ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active i.e. the error message is a ‘:’-separated field, with ERROR in the first place, the error code in the second, the object type (cluster, node, instance) in the third, the name of the object (for nodes/instances) in the fourth, and then the text message. The patch also removes some of the verbosity of the operation (“Verifying instance X”, “Verifying node X”) since on big clusters these informational messages can quickly fill up an entire screen. The original behaviour can be restored via the ‘--verbose’ option. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 24 Aug, 2009 1 commit
-
-
Iustin Pop authored
This patch adds a basic version of LUMoveInstance. It doesn't yet support iallocator-mode and it's implemented in old-style (non-TL) mode. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 17 Aug, 2009 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 14 Aug, 2009 1 commit
-
-
Iustin Pop authored
This can be used for a 'plain' type instance when the underlying storage went away, to recreate the storage (and reinstall) instead of removing the instance and readding it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 10 Aug, 2009 1 commit
-
-
Luca Bigliardi authored
Add an 'empty' logical unit to run hooks after cluster initialization. Signed-off-by:
Luca Bigliardi <shammash@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Aug, 2009 3 commits
-
-
Iustin Pop authored
This patch adds a new opcode and lu for checking disk sizes. Currently it does only top-level disk verification, and also doesn't check primary/secondary node size mismatches (these two are added as TODOs in the Exec() function of the LU). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modified OpActivateDisks, LUActivateDisks and gnt-instance activate-disks to support and pass this option to _AssembleInstanceDisks. The patch is quite trivial I think; there should be no issues from it except if used when not needed. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 03 Aug, 2009 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 31 Jul, 2009 1 commit
-
-
Michael Hanselmann authored
It migrates all primary instances from the node to their secondaries. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 22 Jul, 2009 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 17 Jul, 2009 1 commit
-
-
Iustin Pop authored
This patch converts the opcode loading to a pre-built map (at import time) instead of iteration over the globals dict at each call. Microbenchmarks show that this should be around three times faster, and burnin still passes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 19 Jun, 2009 2 commits
-
-
Iustin Pop authored
This patch adds a new (global) opcode flag 'dry_run' which, when True, causes early exit from the LU workflow, returning a special value from the LU object (initialized in the parent LogicalUnit class, and which if not overriden from child LUs will be None). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This simple patch adds to all opcodes extension of the base opcode __slots__. This way we can add slots across all opcodes, for example 'dry-run'. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 08 Jun, 2009 1 commit
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 27 May, 2009 1 commit
-
-
Iustin Pop authored
This (somewhat big) patch adds support for remotely rebooting the nodes via whatever support the hypervisor has for such a concept. For KVM/fake (and containers in the future) this just uses sysrq plus a ‘reboot’ call if the sysrq method failed. For Xen, it first tries the above, and then Xen-hypervisor reboot (we first try sysrq since that just requires opening a file handle, whereas xen reboot means launching an external utility). The user interface is: # gnt-node powercycle node5 Are you sure you want to hard powercycle node node5? y/[n]/?: y Reboot scheduled in 5 seconds The node reboots hopefully after sending the reply. In case the clock is broken, “time.sleep(5)” might take ages (but then I suspect SSL negotiation wouldn't work). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 19 May, 2009 1 commit
-
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 24 Feb, 2009 1 commit
-
-
Iustin Pop authored
This patch removes the extra_args parameter and instead switches the instance to the HV_KERNEL_ARGS hypervisor option. This is a big change, but it's a needed cleanup, this extra parameter on all RPC calls is not generic and we also need to have a persistent value here. Reviewed-by: imsnah
-
- 10 Feb, 2009 1 commit
-
-
Iustin Pop authored
This patch adds LU and cli-level support for modification of the node drained flag. It is similar to the offline changes. Reviewed-by: imsnah
-
- 06 Feb, 2009 1 commit
-
-
Iustin Pop authored
This patch fixes a couple of issues with the job listing: - in case of a non-existing job, nicely raise 404 instead of 500 - in the job detail listing, also list the job log, the job timestamps, etc. - the opcode migrate instance was missing its description field Reviewed-by: imsnah
-
- 04 Feb, 2009 1 commit
-
-
Iustin Pop authored
This patch adds the framework for, and enables lockless OpQueryInstances. This means that instances will be shown in ERROR_up or ERROR_down state, even though this is not an error (but just an in-progress job). The framework is implemented as follows: - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take an additional “use_locking” flag which will denote whether to lock or not; this patch only implements this for LUQueryInstances - the luxi query functions take an additional argument use_locking which is passed to the master daemon, and then passed to the above opcodes - cli.py export a new SYNC_OPT command line options which implement setting this flag to true - except for gnt-instance list, which uses this option, and for name-only queries (e.g. QueryNodes(fields=["names"])), all other callers are setting this flag to True - RAPI also sets the flag to True The patch was tested with a continuous (0.2s sleep in-between) gnt-instance list during a burnin, and no problems were observed. Reviewed-by: ultrotter
-
- 20 Jan, 2009 1 commit
-
-
Iustin Pop authored
Reviewed-by: ultrotter
-
- 13 Jan, 2009 1 commit
-
-
Iustin Pop authored
This is forward port via copy (and not individual patches cherry-pick) of the latest code on the 1.2 branch related to the migration. The changes compared to 1.2 are the fact that we don't need the IdentifyDisks step anymore (the drbd rpc calls are independent now), and the rpc module improvements. Reviewed-by: ultrotter
-
- 12 Jan, 2009 1 commit
-
-
Iustin Pop authored
This LU can be used to force a push of the config in case it's needed, for example after an upgrade to update the ssconf_release_version file. Reviewed-by: imsnah
-
- 08 Dec, 2008 1 commit
-
-
Iustin Pop authored
This patch changes gnt-node modify and the associated opcode/lu to allow modification of the node offline attribute. Setting a node into offline mode automatically demotes it from the master role. Reviewed-by: ultrotter
-
- 02 Dec, 2008 2 commits
-
-
Iustin Pop authored
This patch adds a new cluster paramater "candidate_pool_size" which tracks the desired size of the list of nodes with the master_candidate flag set. Reviewed-by: imsnah
-
Iustin Pop authored
This patch adds the OpCode, LogicalUnit and gnt-node command for modifying node parameters, more specifically the master candidate flag for a node. Reviewed-by: imsnah
-
- 25 Nov, 2008 1 commit
-
-
Iustin Pop authored
This big patch adds support for: - changing NIC/disks in the multi-device model - adding/removing NICs - adding/removing disks The patch is big and not very nice; the error checking paths are not very clear. The biggest problem is that from a simple instance.ATTR=VAL change (which didn't throw errors before) now we are creating and removing disks in this LU. Reviewed-by: imsnah
-
- 24 Nov, 2008 1 commit
-
-
Guido Trotter authored
Since the hypervisor is instance dependent we'll get one on instance creation, and use the one in the instance config on relocation. Reviewed-by: iustinp
-
- 20 Nov, 2008 1 commit
-
-
Iustin Pop authored
This patch adds support for mult-disk/multi-nic in: - instance add - burnin The start/stop/failover/cluster verify work as expected. Replace disk and grow disk are TODO. There's also a change gnt-job to allow dictionaries to be listed in gnt-job info. Reviewed-by: imsnah
-
- 16 Oct, 2008 1 commit
-
-
Iustin Pop authored
This patch enables the cluster modify to change: - enabled hypervisor list - hvparams (per hypervisor) - beparams (only the default group) Syntax: gnt-cluster modify -B vcpus=3 -H xen-pvm:no_initrd_path Validation for parameters is somewhat missing - the individual hypervisors will be checked for syntax and validation, but beparams doesn't have validation yes (nowhere), it should be added here once we have a global method (will come soon). Reviewed-by: imsnah
-
- 14 Oct, 2008 5 commits
-
-
Iustin Pop authored
The patch adds a new ‘--no-wait-for-sync’ parameter to grow-disk similar to the one in instance add, and changes the default to wait. This is cleaner as at the moment when the command returns, we either have a fully synced disk or there is an error. This is a forward-port of rev 1183 on the 1.2 branch. Reviewed-by: ultrotter
-
Iustin Pop authored
This big patch changes the master code to use the beparams. Errors might have crept in, but it passes a small burnin. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch adds a new '-s' parameter to ‘gnt-instance info’ that makes it return only 'static' information. This is much faster, especially for drbd instances. This is a forward-port of rev 1570 on the ganeti-1.2 branch, resending due to some conflicts. Reviewed-by: imsnah
-
Iustin Pop authored
Reviewed-by: imsnah
-
Iustin Pop authored
This big patch changes instance create to the new hvparams structure. Old parameters are removed, so old jobs or old instances file will break current clusters. Reviewed-by: ultrotter
-
- 08 Oct, 2008 1 commit
-
-
Iustin Pop authored
This (big) patch moves the hypervisor type from the cluster to the instance level; the cluster attribute remains as the default hypervisor, and will be renamed accordingly in a next patch. The cluster also gains the ‘enable_hypervisors’ attribute, and instances can be created with any of the enabled ones (no provision yet for changing that attribute). The many many changes in the rpc/backend layer are due to the fact that all backend code read the hypervisor from the local copy of the config, and now we have to send it (either in the instance object, or as a separate parameter) for each function. The node list by default will list the node free/total memory for the default hypervisor, a new flag to it should exist to select another hypervisor. Instance list has a new field, hypervisor, that shows the instance hypervisor. Cluster verify runs for all enabled hypervisor types. The new FIXMEs are related to IAllocator, since now the node total/free/used memory counts are wrong (we can't reliably compute the free memory). Reviewed-by: imsnah
-