- May 21, 2009
-
-
Iustin Pop authored
Currently, if the instance is stopped, we still check for enough memory on the target node. This is a little bit too strict, since in case too many nodes have failed and one is out of the memory, this prevents fixing the cluster (with the instances down). We change it to do the memory checks only when the instance will be started. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently we miss in hooks the instance's hypervisor, hypervisor parameters and backend parameters. This forces hooks to query back into ganeti, which is dangerous due to possible luxi sockets exhaustion. This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*, INSTANCE_BE_*. The hook environment prefixes all keys with “GANETI”, so a default settings for a xen-pvm instance would be: GANETI_INSTANCE_HV_initrd_path= GANETI_INSTANCE_HV_kernel_args=ro GANETI_INSTANCE_HV_kernel_path=/boot/vmlinuz-2.6-xenU GANETI_INSTANCE_HV_root_path=/dev/sda1 Any dashes in parameter names are changed to underscores, since variables with dashes are not easy to access from the shell (alternatively we could deny those via an unittest for constants.py). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 20, 2009
-
-
Iustin Pop authored
This patch modifies the watcher to keep on-disk a file with the instance status; this can be used from outside of ganeti to react to instances being down (when the watcher cannot restart them). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 19, 2009
-
-
Iustin Pop authored
Currently we have bad behaviour in SafeEncode: - binary strings are actually not handled correctly (ahem) - the encoding is not stable, due to use of string_escape For this reason, we replace the use of string_escape with part of the code of string escape (PyString_Repr in Objects/stringobject.c); we don't escape backslashes or single quotes, since that is that makes it nonstable. Furthermore, we only use the encode('ascii', ...) for unicode inputs. The patch also adds unittests for the function that test basic behaviour. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds constants for the mouse and boot order strings; while there are still some issues remaining, we're trying to cleanup hardcoded strings from the hypervisors. Since the formatting of frozensets is currently wrong, we also add an utility function for this and change all the error messages to use it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds for current instance a ‘disk_space_total’ key, similar to the key for the new instance in case of new allocations. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the rpc.call_instance_start - the master side - to take optional hv/be parameters. The noded side is unchanged and oblivious to the change. This will allow implementation of single-user capability and such on startup (temporary, as opposed to permanent). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 18, 2009
-
-
Guido Trotter authored
Currently QueryJob returns "None" when a wrong job ID is passed. Handle this in gnt-job list, by printing an error for each wrong job, and still giving output for all the jobs which actually do exist. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 15, 2009
-
-
Guido Trotter authored
If the remote info rpc call fails we can't assume that the instance is up. Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
- May 13, 2009
-
-
Guido Trotter authored
Currently LUSetClusterParams will remove the volume group if the vg_name field passed in is not true, but not None. Setting the target volume group to False or the empty string, though, is a bad idea because it's not a boolean value, and at cluster init we set it to None if --no-lvm-storage is passed. With this fix we handle '' (or any other non-None false value) as the "unset" value, but actually store None in the config. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Some fields can be set at cluster init, and perhaps even modifed with SetClusterParams but there's no way to know them. With this patch we export them in the cluster info query. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 12, 2009
-
-
Guido Trotter authored
Currently the KVM hypervisor returns strings for the memory and cpu values, while the xen hypervisor returns integers. Making this uniform converting the values to integers in KVM as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
LUSetInstanceParam currently assumes that the 'memory' value of a call_instance_info result is an integer, while the rest of the code explicitely converts it to int(). Converting it to int works around a bug which prevents changing the memory allocation of a live instance if the remote call returns the memory in string format. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 11, 2009
-
-
Tim Boring authored
Patch for adding network_port to the instance attributes exported by the RAPI. [iustin@google.com: slightly changed the formatting] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 07, 2009
-
-
Carlos Valiente authored
This is for Python 2.6 compatibility. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 05, 2009
-
-
Carlos Valiente authored
Python 2.6 complains about module 'sha' being deprecated. It makes execution of Ganeti commands a bit annoying, and when you run 'ganeti-watcher' in cron jobs, you get a mail message after every execution. Tests pass under under Python 2.6 and Python 2.4. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Carlos Valiente authored
Python 2.6 complains about module 'sha' being deprecated. It makes execution of Ganeti commands a bit annoying, and when you run 'ganeti-watcher' in cron jobs, you get a mail message after every execution. Tests pass under under Python 2.6 and Python 2.4. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
DRBD 8.3 changes two more things compared to 8.2: - /proc/drbd format changed in multiple ways; the part we're interested is the ‘st:’ to ‘ro:‘ change (in the changelog named as “Renamed 'state' to 'role'” - “drbdsetup /dev/drbdN show” changed the ‘device’ stanza from: device "/dev/drbd0"; to: device minor 0; This patch fixes these both and adds data files and unittests for DRBD 8.3.1. Signed-off-by:
Iustin Pop <iustin@google.com>
-
Karsten Keil authored
This patch adds (and suppresses) the extra ipv4/ipv6 words before the actual address that newer DRBD versions add. [iustin@google.com: slightly changed the patch to conform to style guide, and changed the commit message] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
In case of missing programs, currently utils.RunCmd doesn't show any information to help debugging, only 'No such file or directory'. This patch adds error handling for the ENOENT case such that at least we have this information in the node daemon logs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently both hv_fake and hv_kvm implement practically identical code to get the node information. Since future container-like hypervisors will also need this functionality, this patch moves it into the base class (as a separate function) which can then be called from classes which need this info. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch fixes two issues with LUSetClusterParams and argument checking. First, this LU used the wrong function name (CheckParameters instead of CheckArguments), which means that no parameter checking was done at all; this impacted the candidate_pool_size checks (the only one done at this stage). Second, int() can raise both ValueError and TypeError, and we should correctly handle both. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 04, 2009
-
-
Iustin Pop authored
Currently we always try to remove the new file, even if the rename succeeded. This patch tracks the existence of the new file and doesn't try to remove it if we managed to rename it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The current validation routine just says "failed", without specifying the node name. This is very confusing, and we should log the node name too. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
Iustin Pop authored
The current implementation of “gnt-cluster getmaster” doesn't work on non-master nodes, which is a regression from 1.2. This patch implements it (again) via ssconf. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
- Apr 24, 2009
-
-
Iustin Pop authored
Since the “list OSes” call is exported via RAPI, this can be used pretty easily to DOS the master daemon during long jobs. The implementation of LUDiagnoseOS makes an RPC call to all nodes; we lock nodes here in order to prevent node removal. However, after closer examination, the worst case is: - we get the list of nodes from the config - another thread removes a node - our RPC queries reach the removed node As this point, if ganeti-noded is stopped or doesn't accept our queries, the RPC call will return failed, and in the current implementation all OSes will become invalid. If we change the ‘failed RPC’ handling to ignore such nodes, this allows us to both remove locking, and to handle transient RPC failures better (not invalidating all OSes). This patch does both these things, with a single drawback: in gnt-os diagnose, the down nodes do not appear at all. I think this is a small drawback, and the alternative is to add them with status failed; this works (3-line patch), but then the output of “list” and “diagnose” will no longer be consistent. As such, my proposal is to not list the nodes. Reviewed-by: ultrotter
-
Iustin Pop authored
When a remote node returns invalid LVM data, we check it, but we don't stop and continue with the rest of the checks (which require a valid volume group). This raises an internal error and breaks verify disks. This seems unchanged for a long while, I don't know why it surfaced just recently. Reviewed-by: ultrotter
-
Iustin Pop authored
When vg_name is not returned at all, we currently abort with an internal error. This is because we don't catch KeyError. This patch adds a custom message for this case, and also adds KeyError to the list of catched exceptions, just for safety. On the other hand, we could also just remove this piece of code since it's not used at all the ["dfree"] value. Reviewed-by: ultrotter
-
- Apr 15, 2009
-
-
Iustin Pop authored
This patch adds a couple of both externally and internally reported issues: - missing SGML tags (Issue 54), report and patch by superdupont - wrong variable used in the init.d script, report and patch by Karsten Keil <karsten-keil@t-online.de> - man page for gnt-instance reinstall needs clarification (Issue 56) - gnt-instance man page missing --disks documentation for replace-disks - gnt-node modify help output is unclear about the -C/-D/-O input format, and the man page doesn't document this command at all - “gnt-node modify -C yes” for offline or drained nodes had wrong error message - “gnt-instance reinstall --select-os” has wrong prompt, we only accept a number for the OS and not the template name Reviewed-by: ultrotter
-
- Apr 06, 2009
-
-
Iustin Pop authored
This patch fixes the Xen soft reboot ("xm reboot") via polling for a specific time for either changed domain ID or decreased CPU run-time. This sould prevent the race-conditions discussed on the mailing list for reboots. Reviewed-by: imsnah
-
Iustin Pop authored
Since the cluster tags are/should be more-or-less static, add them as an ssconf key, so that querying them is possible without creating a job/requiring the masterd to be running. Reviewed-by: imsnah
-
- Mar 20, 2009
-
-
Guido Trotter authored
Allow expressions longer than one character to match. Reviewed-by: imsnah
-
Guido Trotter authored
set timeout_needs_update to False after calculating the timeout. Reviewed-by: imsnah
-
- Mar 12, 2009
-
-
Guido Trotter authored
There is a bug in kvm, when binding vnc to a specific address the constant 'vnc_bind_address' is passed in, instead of the actual requested address. This patch fixes it. Reviewed-by: iustinp
-
- Mar 10, 2009
-
-
Guido Trotter authored
s/"vnc_bind_address"/constants.HV_VNC_BIND_ADDRESS/ Reviewed-by: imsnah
-
- Mar 09, 2009
-
-
Iustin Pop authored
Currently cluster-verify doesn't handle the (admitedly invalid) case where we have reservation for instances that were removed in the meantime. This patch adds a check for this and prevents code errors in cluster-verify in this case: * Verifying node node4.example.com (master candidate) - ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map Reviewed-by: imsnah
-
Iustin Pop authored
Currently the _CreateSingleBlockDev function only raises OpExecError and not BlockDeviceError. This means that we don't release the instance's temporary minors properly, and this creates problems later if the instance is removed without master restart. We could just use OpExecError, but adding it and leaving BlockDeviceError in seems safer. Reviewed-by: imsnah
-
- Mar 02, 2009
-
-
Iustin Pop authored
This patch export the cluster and node tags to the cluster verify hook scripts. The tags are exported as a space-separated list, which allows easy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do ...”) and therefore requires the previous “Don't allow spaces in tag names” patch. The patch also fixes a minor line length style problem. Reviewed-by: ultrotter
-