- Feb 11, 2010
-
-
Iustin Pop authored
Also automatically fix opcodes which have this missing in the LU init routine. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 10, 2010
-
-
Iustin Pop authored
Commit 154b9580 changed (correctly) the __slots__ usage, but this broke dumpers/loaders since we relied directly on the own class __slots__ field. To compensate, we introduce a simple function for computing the slots across all parent classes (if any), and use this instead of __slots__ directly. Note: the _all_slots() function is duplicated between objects.py and opcodes.py, but the only other options is to introduce a lang.py for such very basic language items. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 09, 2010
-
-
Iustin Pop authored
This patch adds an early_release parameter in the OpReplaceDisks and OpEvacuateNode opcodes, allowing earlier release of storage and more importantly of internal Ganeti locks. The behaviour of the early release is that any locks and storage on all secondary nodes are released early. This is valid for change secondary (where we remove the storage on the old secondary, and release the locks on the old and new secondary) and replace on secondary (where we remove the old storage and release the lock on the secondary node. Using this, on a three node setup: - instance1 on nodes A:B - instance2 on nodes C:B It is possible to run in parallel a replace-disks -s (on secondary) for instances 1 and 2. Replace on primary will remove the storage, but not the locks, as we use the primary node later in the LU to check consistency. It is debatable whether to also remove the locks on the primary node, and thus making replace-disks keep zero locks during the sync. While this would allow greatly enhanced parallelism, let's first see how removal of secondary locks works. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jan 27, 2010
-
-
Balazs Lecz authored
According to http://docs.python.org/reference/datamodel.html#slots * The action of a __slots__ declaration is limited to the class where it is defined. As a result, subclasses will have a __dict__ unless they also define __slots__ (which must only contain names of any /additional/ slots). * If a class defines a slot also defined in a base class, the instance variable defined by the base class slot is inaccessible (except by retrieving its descriptor directly from the base class). This renders the meaning of the program undefined. In the future, a check may be added to prevent this. Signed-off-by:
Balazs Lecz <leczb@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Signed-off-by:
Iustin Pop <iustin@google.com>
-
- Dec 16, 2009
-
-
Iustin Pop authored
This adds a new opcode parameter ‘name_check’ (similar to ip_check) that is not required to be present (to easy backwards compatibility for tools). It also adds a CheckArguments to LUCreateInstance and changes the workflow related to instance IP checks and NIC initialisation based on it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 03, 2009
-
-
Iustin Pop authored
A newer version of pylint, more warnings… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 02, 2009
-
-
Iustin Pop authored
Currently the repair storage has two issues: - down instances are aborting the operation, even though they should be ignored (it's not technically possible to know their disk status unless we would activate their disks) - if the VG is so broken that disks cannot be activated via gnt-instance activate-disks or gnt-instance startup, it's not possible to repair the VG at all The patch makes the opcode skip down instances and also introduces an ``--ignore-consistency`` flag for forcing the execution of the LU. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 13, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
All the LUs that shut down the instance need to be able too pass the timeout parameter as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 09, 2009
-
-
Guido Trotter authored
Using the new --timeout option: - gnt-instance shutdown is changed to accept a timeout - the opcode is changed to hold one - the LU is changed to optionally get one - the rpc is changed to carry one - the backend is changed to take it as a parameter rather than hardcoding it in the function Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 05, 2009
-
-
Guido Trotter authored
These two opcode need to know whether an unknown variant must be forced through or not. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Sep 17, 2009
-
-
Iustin Pop authored
One of the issues we have in ganeti is that it's very hard to test the error-handling paths; QA and burnin only test the OK code-path, since it's hard to simulate errors. LUVerifyCluster is special amongst the LUs in the fact that a) it has a lot of error paths and b) the error paths only log the error, they don't do any rollback or other similar actions. Thus, it's enough for this LU to separate the testing of the error condition from the logging of the error condition. This patch does this by replacing code blocks of the form: if x: log_error() [y] into: log_error_if(x) [if x: y ] After this change, it's simple enough to turn on logging of all errors by adding a special case inside log_error_if such that if the incoming opcode has a special ‘debug_simulate_errors’ attribute and it's true, it will log unconditionally the error. Surprisingly this also turns into an absolute code reduction, since some of the if blocks were simplified. The only downside to this patch is that the various _VerifyX() functions are now stateful (modifying an attribute on the LU instance) instead of returning a boolean result. Last note: yes, this discovered some error cases in the logging. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the output of cluster verify can be parsed for 'ERROR' messages, but that is the only indication we get (error or no error). In order to allow monitoring tools to separate different error conditions, this patch introduces a new output format (“gnt-cluster verify --error-codes”) that changes the output from human-friendly to machine-friendly. In this mode, an error line changes from: ERROR: node node1: drbd minor 1 of instance inst1.is not active to: ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active i.e. the error message is a ‘:’-separated field, with ERROR in the first place, the error code in the second, the object type (cluster, node, instance) in the third, the name of the object (for nodes/instances) in the fourth, and then the text message. The patch also removes some of the verbosity of the operation (“Verifying instance X”, “Verifying node X”) since on big clusters these informational messages can quickly fill up an entire screen. The original behaviour can be restored via the ‘--verbose’ option. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 24, 2009
-
-
Iustin Pop authored
This patch adds a basic version of LUMoveInstance. It doesn't yet support iallocator-mode and it's implemented in old-style (non-TL) mode. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 17, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 14, 2009
-
-
Iustin Pop authored
This can be used for a 'plain' type instance when the underlying storage went away, to recreate the storage (and reinstall) instead of removing the instance and readding it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 10, 2009
-
-
Luca Bigliardi authored
Add an 'empty' logical unit to run hooks after cluster initialization. Signed-off-by:
Luca Bigliardi <shammash@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 04, 2009
-
-
Iustin Pop authored
This patch adds a new opcode and lu for checking disk sizes. Currently it does only top-level disk verification, and also doesn't check primary/secondary node size mismatches (these two are added as TODOs in the Exec() function of the LU). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modified OpActivateDisks, LUActivateDisks and gnt-instance activate-disks to support and pass this option to _AssembleInstanceDisks. The patch is quite trivial I think; there should be no issues from it except if used when not needed. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Aug 03, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 31, 2009
-
-
Michael Hanselmann authored
It migrates all primary instances from the node to their secondaries. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 22, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 17, 2009
-
-
Iustin Pop authored
This patch converts the opcode loading to a pre-built map (at import time) instead of iteration over the globals dict at each call. Microbenchmarks show that this should be around three times faster, and burnin still passes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 19, 2009
-
-
Iustin Pop authored
This patch adds a new (global) opcode flag 'dry_run' which, when True, causes early exit from the LU workflow, returning a special value from the LU object (initialized in the parent LogicalUnit class, and which if not overriden from child LUs will be None). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This simple patch adds to all opcodes extension of the base opcode __slots__. This way we can add slots across all opcodes, for example 'dry-run'. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 08, 2009
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 27, 2009
-
-
Iustin Pop authored
This (somewhat big) patch adds support for remotely rebooting the nodes via whatever support the hypervisor has for such a concept. For KVM/fake (and containers in the future) this just uses sysrq plus a ‘reboot’ call if the sysrq method failed. For Xen, it first tries the above, and then Xen-hypervisor reboot (we first try sysrq since that just requires opening a file handle, whereas xen reboot means launching an external utility). The user interface is: # gnt-node powercycle node5 Are you sure you want to hard powercycle node node5? y/[n]/?: y Reboot scheduled in 5 seconds The node reboots hopefully after sending the reply. In case the clock is broken, “time.sleep(5)” might take ages (but then I suspect SSL negotiation wouldn't work). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 19, 2009
-
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Feb 24, 2009
-
-
Iustin Pop authored
This patch removes the extra_args parameter and instead switches the instance to the HV_KERNEL_ARGS hypervisor option. This is a big change, but it's a needed cleanup, this extra parameter on all RPC calls is not generic and we also need to have a persistent value here. Reviewed-by: imsnah
-
- Feb 10, 2009
-
-
Iustin Pop authored
This patch adds LU and cli-level support for modification of the node drained flag. It is similar to the offline changes. Reviewed-by: imsnah
-
- Feb 06, 2009
-
-
Iustin Pop authored
This patch fixes a couple of issues with the job listing: - in case of a non-existing job, nicely raise 404 instead of 500 - in the job detail listing, also list the job log, the job timestamps, etc. - the opcode migrate instance was missing its description field Reviewed-by: imsnah
-
- Feb 04, 2009
-
-
Iustin Pop authored
This patch adds the framework for, and enables lockless OpQueryInstances. This means that instances will be shown in ERROR_up or ERROR_down state, even though this is not an error (but just an in-progress job). The framework is implemented as follows: - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take an additional “use_locking” flag which will denote whether to lock or not; this patch only implements this for LUQueryInstances - the luxi query functions take an additional argument use_locking which is passed to the master daemon, and then passed to the above opcodes - cli.py export a new SYNC_OPT command line options which implement setting this flag to true - except for gnt-instance list, which uses this option, and for name-only queries (e.g. QueryNodes(fields=["names"])), all other callers are setting this flag to True - RAPI also sets the flag to True The patch was tested with a continuous (0.2s sleep in-between) gnt-instance list during a burnin, and no problems were observed. Reviewed-by: ultrotter
-
- Jan 20, 2009
-
-
Iustin Pop authored
Reviewed-by: ultrotter
-
- Jan 13, 2009
-
-
Iustin Pop authored
This is forward port via copy (and not individual patches cherry-pick) of the latest code on the 1.2 branch related to the migration. The changes compared to 1.2 are the fact that we don't need the IdentifyDisks step anymore (the drbd rpc calls are independent now), and the rpc module improvements. Reviewed-by: ultrotter
-
- Jan 12, 2009
-
-
Iustin Pop authored
This LU can be used to force a push of the config in case it's needed, for example after an upgrade to update the ssconf_release_version file. Reviewed-by: imsnah
-
- Dec 08, 2008
-
-
Iustin Pop authored
This patch changes gnt-node modify and the associated opcode/lu to allow modification of the node offline attribute. Setting a node into offline mode automatically demotes it from the master role. Reviewed-by: ultrotter
-
- Dec 02, 2008
-
-
Iustin Pop authored
This patch adds a new cluster paramater "candidate_pool_size" which tracks the desired size of the list of nodes with the master_candidate flag set. Reviewed-by: imsnah
-
Iustin Pop authored
This patch adds the OpCode, LogicalUnit and gnt-node command for modifying node parameters, more specifically the master candidate flag for a node. Reviewed-by: imsnah
-
- Nov 25, 2008
-
-
Iustin Pop authored
This big patch adds support for: - changing NIC/disks in the multi-device model - adding/removing NICs - adding/removing disks The patch is big and not very nice; the error checking paths are not very clear. The biggest problem is that from a simple instance.ATTR=VAL change (which didn't throw errors before) now we are creating and removing disks in this LU. Reviewed-by: imsnah
-