- Aug 04, 2009
-
-
Iustin Pop authored
This patch adds a new opcode and lu for checking disk sizes. Currently it does only top-level disk verification, and also doesn't check primary/secondary node size mismatches (these two are added as TODOs in the Exec() function of the LU). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modified OpActivateDisks, LUActivateDisks and gnt-instance activate-disks to support and pass this option to _AssembleInstanceDisks. The patch is quite trivial I think; there should be no issues from it except if used when not needed. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This is identical to dc30b0e4 but applied to gnt-backup. Thanks to user ocaner for catching it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 08, 2009
-
-
Guido Trotter authored
This allows failing over in certain corner cases, such as a 2 node cluster with one node down. The man page is also updated to document this dangerous option and how to recover from this situation. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 30, 2009
-
-
Iustin Pop authored
This patch fixes a few node readd issues. Currently, the node readd consists of two opcodes: - OpSetNodeParms, which resets the offline/drained flags - OpAddNode (with readd=True), which reconfigures the node The problem is that between these two, the configuration is inconsistent for certain cluster configurations. Thus, this patch removes the first opcode and modified the LUAddNode to deal with this case too. The patch also modifies the computation of the intended master_candidate status, and actually sets the readded node to master candidate if needed. Previously, we didn't modify the existing node at all. Finally, the patch modifies the bottom of the Exec() function for this LU to: - trigger a node update, which in turn redistributes the ssconf files to all nodes (and thus the new node too) - if the new node is not a master candidate, then call the node_demote_from_mc RPC so that old master files are cleared My testing shows this behaves correctly for various cases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 29, 2009
-
-
Iustin Pop authored
This patch adds a ‘role’ node list field, which shows a one-character node status. This is a simpler way to see the node status than selecting all the flags individually. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 17, 2009
-
-
Iustin Pop authored
Currently running “gnt-instance list -o+vcpus” fails with a cryptic message: Unhandled Ganeti error: vcpus This is due to multiple issues: - in some corner cases cmdlib.py raises an errors.ParameterError but this is not handled by cli.py - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle it, so instead of failing with unknown parameter, e.g.: Failure: prerequisites not met for this operation: Unknown output fields selected: vcpuscd it raises the ParameteError message This patch: - adds handling of 'vcpus' to LUQueryInstances - adds handling of the ParameterError exception to cli.py - changes the 'else: raise errors.ParameterError' in the field handling of LUQueryInstance to an assert, since it's a programmer error if we reached this step With this, a future unhandled parameter will show: gnt-instance list -o+vcpus Unhandled protocol error while talking to the master daemon: Caught exception: Declared but unhandled parameter 'vcpus' Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The size of the instance's disk was not shown in “gnt-instance info”. This patch adds it and formats it nicely if possible. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 19, 2009
-
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 18, 2009
-
-
Guido Trotter authored
Currently QueryJob returns "None" when a wrong job ID is passed. Handle this in gnt-job list, by printing an error for each wrong job, and still giving output for all the jobs which actually do exist. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 13, 2009
-
-
Guido Trotter authored
Currently doing a gnt-cluster-modify --no-lvm-storage is silently ignored, as it passes a None value in vg_name, which is the same as not modifying that parameter. Explicitely set the passed value to '', so the non-true not-None value can be evaluate to actually remove a volume group. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Even if we cannot modify all of them, they are useful information about the current cluster. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 11, 2009
-
-
Iustin Pop authored
The _TransformPath function is not used anymore in 2.0, let's remove it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 04, 2009
-
-
Iustin Pop authored
Currently “gnt-debug submit-job” takes a single argument and has non-trivial startup-costs; in order to exercise the job system, it is better to be able to submit multiple jobs with a single invocation of the script. This patch extends it to take multiple argument, de-serialize the opcodes and then submit all of them as fast as possible, in order to increase pressure on the master daemon. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
Iustin Pop authored
The current implementation of “gnt-cluster getmaster” doesn't work on non-master nodes, which is a regression from 1.2. This patch implements it (again) via ssconf. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
- Apr 24, 2009
-
-
Guido Trotter authored
Don't show all instances info by default, but require --all to be passed for this time consuming operation. Reviewed-by: iustinp
-
- Apr 15, 2009
-
-
Iustin Pop authored
This patch adds a couple of both externally and internally reported issues: - missing SGML tags (Issue 54), report and patch by superdupont - wrong variable used in the init.d script, report and patch by Karsten Keil <karsten-keil@t-online.de> - man page for gnt-instance reinstall needs clarification (Issue 56) - gnt-instance man page missing --disks documentation for replace-disks - gnt-node modify help output is unclear about the -C/-D/-O input format, and the man page doesn't document this command at all - “gnt-node modify -C yes” for offline or drained nodes had wrong error message - “gnt-instance reinstall --select-os” has wrong prompt, we only accept a number for the OS and not the template name Reviewed-by: ultrotter
-
- Apr 06, 2009
-
-
Iustin Pop authored
This patch raises an error in the master daemon in case the user requests a locking query; accordingly, all clients were modified to send only lockless queries. This is short-term fix, for proper fix the clients should be modified to submit a job when the user request a locking query. The other approach would be to ignore the flag passed by the client; this would be worse as client's wouldn't get at least an error. The possible impact of this is multiple: - some commands could have been not converted, and thus fail; this can be remedied easily - the consistency of commands is lost; e.g. node failover will not lock the node *while we get the node info*, so we could miss some data; this is again in the thread of atomic operations which are missing in the current model of query-and-act from gnt-* scripts Reviewed-by: imsnah, ultrotter
-
- Mar 20, 2009
-
-
Guido Trotter authored
# gnt-cluster queue foo Failure: prerequisites not met for this operation: Command 'foo' is not valid. Reviewed-by: iustinp
-
- Mar 12, 2009
-
-
Iustin Pop authored
Similar to the --disk fixes a while ago, --net is broken too. This patch fixes it. Reviewed-by: imsnah
-
- Feb 24, 2009
-
-
Iustin Pop authored
This patch removes the extra_args parameter and instead switches the instance to the HV_KERNEL_ARGS hypervisor option. This is a big change, but it's a needed cleanup, this extra parameter on all RPC calls is not generic and we also need to have a persistent value here. Reviewed-by: imsnah
-
Guido Trotter authored
Having hvattr descriptions is only confusing for the user, because even if they explain better what an attribute is about, they don't help in deciding what keyword should be used to actually set it. If in the future we want descriptions they should probably live in constants.py, and be displayed together with the key, rather than instead of it. This patch also changes the handling of the vnc console connection description, making compatible work with both kvm and xen-hvm. Reviewed-by: iustinp
-
- Feb 13, 2009
-
-
Iustin Pop authored
This patch adds back to the instance creation command (gnt-instace add, gnt-backup import) the ‘-s’ short form option for specifying a single-disk instance. Also a small bug in gnt-backup import is fixed. Reviewed-by: ultrotter
-
- Feb 12, 2009
-
-
Iustin Pop authored
This patch changes the gnt-node and gnt-job list commands to accept argument and list only the selected items, which is useful when having many nodes or jobs. It also removes the “--units” option from gnt-job list as we don't actually use it. Reviewed-by: imsnah
-
Iustin Pop authored
This patch changes the scripts so that the short name for the “--iallocator” option is always ‘-I’. Reviewed-by: ultrotter
-
Iustin Pop authored
Currently the batcher hypervisor parameter must be a dict with one element (e.g. {"xen-hvm": { "acpi": true }}). This is overly complex and hard to validate correctly; the patch splits it in two: - one "hypervisor" string parameter, with the name of the hypervisor - one "hvparams" dictionary, with the hypervisor parameters The patch also changes the error handling in parsing the definition file - since this is not a long-running file, we are less concerned with safe closing of the file, and more with presenting meaningful error messages. Reviewed-by: killerfoxi
-
- Feb 11, 2009
-
-
Guido Trotter authored
It's hvparams, not opts.hvparams. Reviewed-by: iustinp
-
Guido Trotter authored
If hvparams is not set it will be [], so dict() will transform it to an empty dict, which is safe in all cases. Reviewed-by: iustinp
-
- Feb 10, 2009
-
-
Iustin Pop authored
The patch sorts the instance list in gnt-node info output, in order to make it more readable (and stable). Reviewed-by: imsnah
-
Iustin Pop authored
The patch changes the pre-checks in node-add and re-add: - if the node is not already in the cluster, refuse to re-add - when re-adding, reuse the secondary IP from the cluster configuration - when re-adding, reset the offline and drained flags, so that RPC calls work (and we can actually upload the keys) The patch also adds a missing log entry in LUSetNodeParams. Reviewed-by: imsnah
-
Guido Trotter authored
We want all the hv/be parameters to have a known type, rather than a random mix of empty string, boolean values, and None, so we declare the type of each variable and we enforce/convert it. - Add some new constants for enforceable value types - Add new constants dicts HVS_PARAMETER_TYPES and BES_PARAMETER_TYPES holding not only the valid parameters but also their types - Drop the old HVS_PARAMETERS and BES_PARAMETERS constants and calculate the values from the type dict - Convert all the default parameters to a valid type value - Create a new ForceDictType utils function, to check/enforce a dict's element value types, with relevant unit tests - Drop a few custom functions to check/convert the BE param types in utils and cli, in favor of ForceDictType - Double-check the parameter types using ForceDictType in both scripts and LogicalUnits, when possible. As a bonus: - Remove some old commented-out code in gnt-instance - Remove some already fixed FIXME - Fix a bug which prevented VALUE_DEFAULT to be applied to BE parameters in SetInstanceParams because the value was checked for validity before that transformation was made - Fix a bug which prevented initing a cluster and passing hvparams to work at all - ForceDictType allows an allowed_values for exceptions, which makes us able to do the checking even when some values must not be converted/typechecked (for example the 'default' string in SetInstanceParameters) Reviewed-by: iustinp
-
Iustin Pop authored
This patch adds LU and cli-level support for modification of the node drained flag. It is similar to the offline changes. Reviewed-by: imsnah
-
Iustin Pop authored
This patch exports the drained attribute: - LUQueryNodes accepts now the drained field - RAPI exports it for node objects - gnt-node info shows it now (along newly-added master_candidate and offline flags) - gnt-node list can list it (but not by default) - to the iallocator scripts Reviewed-by: imsnah
-
- Feb 09, 2009
-
-
Iustin Pop authored
This patch adds a new instance query flag called disk_usage that retrieves the overall space used by an instance on each of its nodes. This can be used when balancing the cluster or checking N+1 status. The flag is also exported in RAPI. Note the flag is currently broken for file-based instances, as it represents the amount of space in the cluster volume group. Reviewed-by: ultrotter
-
Iustin Pop authored
This is a hand-picked forward patch of commit 1755 on the 1.2 branch (hand-picked since the trees diverged too much since then): The patch changed the xen hypervisor to compute the number of cpu sockets/nodes and enables the command line and the RAPI to show this information (for RAPI is enabled by default in node details, for gnt-one one can use the new “cnodes” and “csockets” fields). Originally-Reviewed-by: ultrotter For the KVM and fake hypervisors, the patch just exports 1 for both nodes and sockets. This can be fixed, by looking at the /sys/devices/system/cpu/cpuN/topology directories, and computing the actual information, but that should be done in a separate patch. Reviewed-by: imsnah
-
- Feb 05, 2009
-
-
Iustin Pop authored
This patch converts some more jobs with only queries into cheaper luxi queries (no job created), and fixes some fallout from the lockless queries changes. Reviewed-by: ultrotter
-
- Feb 04, 2009
-
-
Iustin Pop authored
Similar to the instance list, this patch enables lockless node queris. “gnt-node list” accepts now the “--sync” flag which enables locking, the default is lockless. Reviewed-by: imsnah
-
Iustin Pop authored
This patch adds the online node list and instance list to the ssconf keys. In order to do distribute correctly the instance list, we need to update the cluster serial number on instance additions and removals. The patch also changes the permissions on the ssconf files to be 0444: - no write for root, in order to signal that these file should not be modified - read for everyone since the files don't contain sensitive data anymore (and permissions can be controlled via the parent directory if needed) The patch also fixes a small typo on gnt-cluster. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch adds the framework for, and enables lockless OpQueryInstances. This means that instances will be shown in ERROR_up or ERROR_down state, even though this is not an error (but just an in-progress job). The framework is implemented as follows: - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take an additional “use_locking” flag which will denote whether to lock or not; this patch only implements this for LUQueryInstances - the luxi query functions take an additional argument use_locking which is passed to the master daemon, and then passed to the above opcodes - cli.py export a new SYNC_OPT command line options which implement setting this flag to true - except for gnt-instance list, which uses this option, and for name-only queries (e.g. QueryNodes(fields=["names"])), all other callers are setting this flag to True - RAPI also sets the flag to True The patch was tested with a continuous (0.2s sleep in-between) gnt-instance list during a burnin, and no problems were observed. Reviewed-by: ultrotter
-
- Feb 03, 2009
-
-
Iustin Pop authored
This is a partial implementation of fully automated node evacuation: we allow passing an iallocator and all instance replace-disks will be execute via that iallocator. The individual OpReplaceDisks opcodes are submitted in a single job, which causes them to be executed serially and thus keeps the iallocator runs consistent. This also changes the behaviour so that the first reallocation that failed will stop all the reallocations. Reviewed-by: ultrotter
-