- 27 May, 2009 1 commit
-
-
Iustin Pop authored
This (somewhat big) patch adds support for remotely rebooting the nodes via whatever support the hypervisor has for such a concept. For KVM/fake (and containers in the future) this just uses sysrq plus a ‘reboot’ call if the sysrq method failed. For Xen, it first tries the above, and then Xen-hypervisor reboot (we first try sysrq since that just requires opening a file handle, whereas xen reboot means launching an external utility). The user interface is: # gnt-node powercycle node5 Are you sure you want to hard powercycle node node5? y/[n]/?: y Reboot scheduled in 5 seconds The node reboots hopefully after sending the reply. In case the clock is broken, “time.sleep(5)” might take ages (but then I suspect SSL negotiation wouldn't work). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 22 May, 2009 3 commits
-
-
Guido Trotter authored
Each hypervisor can declare additional files to be shipped to all nodes. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This function is shared between AddNode and RedistributeConfig, and used to redistribute additional files which are inherently part of the cluster configuration. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently just for xen-hvm we copy the vnc password on node-add. This will be changed for 2.1 with a more advanced gnt-cluster redist-conf functionality which is going to be used by node-add as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 19 May, 2009 3 commits
-
-
Iustin Pop authored
This patch adds for current instance a ‘disk_space_total’ key, similar to the key for the new instance in case of new allocations. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the rpc.call_instance_start - the master side - to take optional hv/be parameters. The noded side is unchanged and oblivious to the change. This will allow implementation of single-user capability and such on startup (temporary, as opposed to permanent). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 15 May, 2009 2 commits
-
-
Guido Trotter authored
If the remote info rpc call fails we can't assume that the instance is up. Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
- 13 May, 2009 2 commits
-
-
Guido Trotter authored
Currently LUSetClusterParams will remove the volume group if the vg_name field passed in is not true, but not None. Setting the target volume group to False or the empty string, though, is a bad idea because it's not a boolean value, and at cluster init we set it to None if --no-lvm-storage is passed. With this fix we handle '' (or any other non-None false value) as the "unset" value, but actually store None in the config. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Some fields can be set at cluster init, and perhaps even modifed with SetClusterParams but there's no way to know them. With this patch we export them in the cluster info query. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 12 May, 2009 1 commit
-
-
Guido Trotter authored
LUSetInstanceParam currently assumes that the 'memory' value of a call_instance_info result is an integer, while the rest of the code explicitely converts it to int(). Converting it to int works around a bug which prevents changing the memory allocation of a live instance if the remote call returns the memory in string format. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 07 May, 2009 1 commit
-
-
Carlos Valiente authored
This is for Python 2.6 compatibility. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 06 May, 2009 1 commit
-
-
Guido Trotter authored
Sometimes reinstalls are slightly different than new installs. For example certain partitions may need to be preserved accross reinstalls. In order to do that on a per-os basis we pass in the INSTANCE_REINSTALL variable to inform the create script about when a reinstall is happening. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 05 May, 2009 1 commit
-
-
Iustin Pop authored
This patch fixes two issues with LUSetClusterParams and argument checking. First, this LU used the wrong function name (CheckParameters instead of CheckArguments), which means that no parameter checking was done at all; this impacted the candidate_pool_size checks (the only one done at this stage). Second, int() can raise both ValueError and TypeError, and we should correctly handle both. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 04 May, 2009 1 commit
-
-
Iustin Pop authored
The current validation routine just says "failed", without specifying the node name. This is very confusing, and we should log the node name too. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
- 24 Apr, 2009 3 commits
-
-
Iustin Pop authored
Since the “list OSes” call is exported via RAPI, this can be used pretty easily to DOS the master daemon during long jobs. The implementation of LUDiagnoseOS makes an RPC call to all nodes; we lock nodes here in order to prevent node removal. However, after closer examination, the worst case is: - we get the list of nodes from the config - another thread removes a node - our RPC queries reach the removed node As this point, if ganeti-noded is stopped or doesn't accept our queries, the RPC call will return failed, and in the current implementation all OSes will become invalid. If we change the ‘failed RPC’ handling to ignore such nodes, this allows us to both remove locking, and to handle transient RPC failures better (not invalidating all OSes). This patch does both these things, with a single drawback: in gnt-os diagnose, the down nodes do not appear at all. I think this is a small drawback, and the alternative is to add them with status failed; this works (3-line patch), but then the output of “list” and “diagnose” will no longer be consistent. As such, my proposal is to not list the nodes. Reviewed-by: ultrotter
-
Iustin Pop authored
When a remote node returns invalid LVM data, we check it, but we don't stop and continue with the rest of the checks (which require a valid volume group). This raises an internal error and breaks verify disks. This seems unchanged for a long while, I don't know why it surfaced just recently. Reviewed-by: ultrotter
-
Iustin Pop authored
When vg_name is not returned at all, we currently abort with an internal error. This is because we don't catch KeyError. This patch adds a custom message for this case, and also adds KeyError to the list of catched exceptions, just for safety. On the other hand, we could also just remove this piece of code since it's not used at all the ["dfree"] value. Reviewed-by: ultrotter
-
- 15 Apr, 2009 1 commit
-
-
Iustin Pop authored
This patch adds a couple of both externally and internally reported issues: - missing SGML tags (Issue 54), report and patch by superdupont - wrong variable used in the init.d script, report and patch by Karsten Keil <karsten-keil@t-online.de> - man page for gnt-instance reinstall needs clarification (Issue 56) - gnt-instance man page missing --disks documentation for replace-disks - gnt-node modify help output is unclear about the -C/-D/-O input format, and the man page doesn't document this command at all - “gnt-node modify -C yes” for offline or drained nodes had wrong error message - “gnt-instance reinstall --select-os” has wrong prompt, we only accept a number for the OS and not the template name Reviewed-by: ultrotter
-
- 09 Mar, 2009 2 commits
-
-
Iustin Pop authored
Currently cluster-verify doesn't handle the (admitedly invalid) case where we have reservation for instances that were removed in the meantime. This patch adds a check for this and prevents code errors in cluster-verify in this case: * Verifying node node4.example.com (master candidate) - ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map Reviewed-by: imsnah
-
Iustin Pop authored
Currently the _CreateSingleBlockDev function only raises OpExecError and not BlockDeviceError. This means that we don't release the instance's temporary minors properly, and this creates problems later if the instance is removed without master restart. We could just use OpExecError, but adding it and leaving BlockDeviceError in seems safer. Reviewed-by: imsnah
-
- 02 Mar, 2009 2 commits
-
-
Iustin Pop authored
This patch export the cluster and node tags to the cluster verify hook scripts. The tags are exported as a space-separated list, which allows easy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do ...”) and therefore requires the previous “Don't allow spaces in tag names” patch. The patch also fixes a minor line length style problem. Reviewed-by: ultrotter
-
Iustin Pop authored
This updates the iallocator documentation to 2.0, bumps up the iallocator version (and moves a constants to lib/constants.py), and fixes a style on install.rst. Reviewed-by: ultrotter
-
- 27 Feb, 2009 2 commits
-
-
Guido Trotter authored
If we're only file based and out volume group is set to "None" there's no point in asking nodes for their volume groups, logical volumes, and drbd devices, and checking those. Reviewed-by: iustinp
-
Iustin Pop authored
99% of the epydoc return tags are "@return:", but each of the modified files had one "@returns:" line. We fix this for consistency. Reviewed-by: imsnah
-
- 25 Feb, 2009 1 commit
-
-
Iustin Pop authored
While reviewing the hooks document, I realised we are not correctly exporting the instance properties. This patch fixes: - export the disk and disk template in all LUs, not only (hardcoded) in the instance create - removes the instance create INSTANCE_ prefix on some non-instance variables (those are LU-related, not instance-related) - adds a couple of more variables to other LUs The hook document will be updated in a separate patch. Reviewed-by: ultrotter
-
- 24 Feb, 2009 2 commits
-
-
Iustin Pop authored
This patch removes the extra_args parameter and instead switches the instance to the HV_KERNEL_ARGS hypervisor option. This is a big change, but it's a needed cleanup, this extra parameter on all RPC calls is not generic and we also need to have a persistent value here. Reviewed-by: imsnah
-
Iustin Pop authored
This simply makes LUQueryInstanceData return the same information as for a static query when one or both of the nodes are down. Reviewed-by: imsnah
-
- 16 Feb, 2009 2 commits
-
-
Iustin Pop authored
There are two issues fixed in this patch: - first, the recent RPC changes caused loss of data in hard reboot type; we weren't reporting any results from the stop/start instance calls; - second, in soft or hard reboots, we didn't initialized the disk physical ID; based on the last state of the instance's disks, this can create a failure in identifying the disks After this patch, burnin works again with reboot, and reports errors correctly. Reviewed-by: imsnah
-
Iustin Pop authored
If /proc/drbd can't be opened, this raises an IOError, but all the error-handling behaviour in backend treats only BlockDeviceErrors. This creates a plain failure in cluster verify and in other RPC calls. This patch simply converts EnvironmentErrors into BlockDeviceErrors, and also changes the RPC result for NV_DRBDLIST and its handling to be able to show the error. The other RPC calls work by default now, due the existing error handling. Reviewed-by: ultrotter
-
- 13 Feb, 2009 1 commit
-
-
Guido Trotter authored
Currently we export the old instance "as is" and any nic changes get lost, so hooks won't know of a different ip, bridge, or mac address. This patch fixes it by putting the nics in the override dict, if any changes are done. Reviewed-by: iustinp
-
- 12 Feb, 2009 7 commits
-
-
Guido Trotter authored
CheckArguments: Use constants.VALUE_NONE rather than hardcoding the string "none" If we're adding a nic fill the nic_dict with default values Check if the mac is syntactically valid, if we have one Don't allow the mac to be 'auto' when modifying a nic CheckPrereq: Check that bridge and mac if present in the dict are not None (before this wasn't handled at all) Generate the nic mac address here if demanded Exec: Do not generate nics and macs Reviewed-by: iustin
-
Guido Trotter authored
We want the real nic to be shown to the hooks and the allocators, so we'll generate them in CheckPrereq. We also write a comment about the race condition we generate. This race condition existed even before, so moving this generation will just lenghen it a bit. A separate patch mitigates its effects. This patch also adds an ENDIF comment for a very long if, and removes a double empty line inside the CheckPrereq function of LUCreateInstance. Reviewed-by: iustin
-
Iustin Pop authored
This patch removes the admin_ram LUQueryInstances field (is broken anyway) and fixes the VNC address checks in the Xen Hypervisor. Reviewed-by: imsnah
-
Iustin Pop authored
The query fields are now regular expressions. We need to quote the dots, otherwise invalid fields will be accepted but they will lose special formatting in the cli scripts. Reviewed-by: imsnah
-
Iustin Pop authored
For (status, data)-style RPC calls, the result data is in the ‘payload’ attribute. This was missed in the conversion patch, with the only side effect that gnt-instance activate-disks didn't show a nice output anymore. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch changes the return type from this RPC call to include status information and renames the backend method to match the RPC call name. The patch is a little bigger than the reboot one, since this call is used in more than one place. However, all the points of call have the same usage pattern, so the patch is trivial. Reviewed-by: ultrotter
-
Iustin Pop authored
This small patch changes the return type from this RPC call to include status information and renames the backend method to match the RPC call name. Reviewed-by: ultrotter
-
- 11 Feb, 2009 1 commit
-
-
Guido Trotter authored
Currently when adding disks the base for the index is not taken into account, and disk 0 is added twice. Reviewed-by: iustinp
-