- Jun 16, 2009
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Fix a typo in the man pages that used the wrong option name. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 15, 2009
-
-
Iustin Pop authored
Commit cf8df3f3 "bdev: forward-port ReAttachNet/DisconnectNet" forward-ported 1.2's bdev.DRBD8.ReAttachNet() to 2.0 while renaming it to AttachNet(), but commit 6b93ec9d "Forward-port DrbdNetReconfig" didn't rename all the calls to it and left one ReAttachNet call in backend.py. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 11, 2009
-
-
Guido Trotter authored
Reported in issue 59. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 04, 2009
-
-
Iustin Pop authored
This patch is an attempt at fixing some very rare occurrences of messages like: - "There are some degraded disks for this instance", or: - "Cannot resync disks on node node3.example.com: [True, 100]" What I believe happens is that drbd has finished syncing, but not all fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or in some other transient state we don't handle well. The patch will change the _WaitForSync method to recheck up to a hardcoded number of times if we're finished syncing but we're degraded (using the same condition as the 'break' clause of the loop). The cons of this changes is that a normal, really-degraded due to network or disk failure will cause an extra delay before it aborts. For this, I'm happy to choose other values. A better, long term fix is to handle more DRBD state correctly (see the bdev.DRBD8Status class). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 03, 2009
-
-
Iustin Pop authored
This patch fixes two issues related to failed snapshots during exports: - first, the error messages used disk.logical_id[1], which is a node name for DRBD, and it resulted in strange error messages like "cannot snapshot block device node1 on node2" - second, if snapshotting fails for any disk, rpc.call_finalize_export fails as it didn't handle booleans (backend.FinalizeExport does) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 28, 2009
-
-
Iustin Pop authored
* next: (34 commits) watcher: automatically restart noded/rapi watcher: handle full and drained queue cases rapi: rework error handling Fix backend.OSEnvironment be/hv parameters rapi: make tags query not use jobs Change failover instance when instance is stopped Export more instance information in hooks watcher: write the instance status to a file Fix the SafeEncoding behaviour Move more hypervisor strings into constants Add -H/-B startup parameters to gnt-instance call_instance_start: add optional hv/be parameters Fix gnt-job list argument handling Instance reinstall: don't mix up errors Don't check memory at startup if instance is up gnt-cluster modify: fix --no-lvm-storage LUSetClusterParams: improve volume group removal gnt-cluster info: show more cluster parameters LUQueryClusterInfo: return a few more fields Add the new DRBD test files to the Makefile ... Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 27, 2009
-
-
Iustin Pop authored
This is simply a version bump, no changes from rc5. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 25, 2009
-
-
Iustin Pop authored
This patch makes the watcher automatically restart the node and rapi daemons, if they are not running (as per the PID file). This is not an exhaustive test; a better one would be TCP connect to the port, and an even better one a simple protocol ping (e.g. get / for rapi and a rpc_call_alive for noded), but since we don't know how they've been started we can't implement it today. rapi would need to write the SSL/port to a file, and noded something similar, so that we know how to connect. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the watcher is broken when the queue is full, thus not fulfilling its job as a queue cleaner. It also doesn't handle nicely the queue drained status. This patch does a few changes: - first archive jobs, and only after submit jobs; this fixes the case where the queue is already full and there are jobs suited for archiving (but not the case where the jobs all too young to be archived) - handle nicely the job queue full and drained cases—instead of tracebacks, log such cases nicely - reverse the initial value and special cases for update_file; we now whitelist instead of blacklist cases, since we have much more blacklist cases than vice versa, and we set the flag to True only after the run is successful The last change, especially, is a significant one: now errors during the watcher run will not update the status file, and thus they won't be lost again in the logs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the rapi code doesn't have any custom error handling; any exceptions raised are simply converted into an HTTP 500 error, without much explanation. This patch adds a couple of generic SubmitJob/GetClient functions that handle some errors specially so that they are transformed into HTTP errors, with more detailed information. With this patch, the behaviour of rapi when the queue is full or drained, or when the master is down is more readable. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Commit 67fc3042 added some more variables to be exported to OSEnvironment, but it has two bugs: - wrong variable name (env vs. result) - in OSEnvironment we don't have the automatic converstion to strings that we do in hooks, so we must manually enforce this With this patch instance creations work again. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently the rapi tags query implementation is similar to the command line one: it submits OpGetTags jobs. This not good, since this being an API it can be used a lot and can pollute the job queue with many such trivial jobs. This patch converts it to use either queries (for nodes/instances) or direct read from ssconf (for the cluster case). For ssconf, we added a function to the ssconf.SimpleStore class for reading the tags. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 21, 2009
-
-
Iustin Pop authored
Currently, if the instance is stopped, we still check for enough memory on the target node. This is a little bit too strict, since in case too many nodes have failed and one is out of the memory, this prevents fixing the cluster (with the instances down). We change it to do the memory checks only when the instance will be started. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently we miss in hooks the instance's hypervisor, hypervisor parameters and backend parameters. This forces hooks to query back into ganeti, which is dangerous due to possible luxi sockets exhaustion. This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*, INSTANCE_BE_*. The hook environment prefixes all keys with “GANETI”, so a default settings for a xen-pvm instance would be: GANETI_INSTANCE_HV_initrd_path= GANETI_INSTANCE_HV_kernel_args=ro GANETI_INSTANCE_HV_kernel_path=/boot/vmlinuz-2.6-xenU GANETI_INSTANCE_HV_root_path=/dev/sda1 Any dashes in parameter names are changed to underscores, since variables with dashes are not easy to access from the shell (alternatively we could deny those via an unittest for constants.py). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 20, 2009
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the watcher to keep on-disk a file with the instance status; this can be used from outside of ganeti to react to instances being down (when the watcher cannot restart them). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 19, 2009
-
-
Iustin Pop authored
Currently we have bad behaviour in SafeEncode: - binary strings are actually not handled correctly (ahem) - the encoding is not stable, due to use of string_escape For this reason, we replace the use of string_escape with part of the code of string escape (PyString_Repr in Objects/stringobject.c); we don't escape backslashes or single quotes, since that is that makes it nonstable. Furthermore, we only use the encode('ascii', ...) for unicode inputs. The patch also adds unittests for the function that test basic behaviour. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds constants for the mouse and boot order strings; while there are still some issues remaining, we're trying to cleanup hardcoded strings from the hypervisors. Since the formatting of frozensets is currently wrong, we also add an utility function for this and change all the error messages to use it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Bugs in either our code or in associated libraries can bring the master daemon down, and this (due to the 2.0 architecture) stops all work on the cluster. Since the watcher already does periodic checks on the cluster, we modify it to try to start the master automatically in case of failures to connect. This will be tried only once per cycle. Also, in this case, we modify the code so that the watcher status file is not updated - its timestamp will reflect thus the time of last successful connection to the master. Side note: the except errors.ConfigurationError part could be cleaned up, since in 2.0 we don't usually get that directly, and if we do it's an error and we shouldn't touch the file anyway; but that is not a rc5 change. Signed-off-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This patch adds for current instance a ‘disk_space_total’ key, similar to the key for the new instance in case of new allocations. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the start instance script, opcode and logical unit to support temporary startup parameters. Different from 1.2, where only the kernel arguments were supporting changes (and thus xen-pvm specific), this version supports changing all hypervisor and backend parameters (with appropriate checks). This is much more flexible, and allows for example: - start with different, temporary kernel - start with different memory size Note: in later versions, this should be extended to cover disk parameters as well (e.g. start with drbd without flushes, start with drbd in async mode, etc.). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch modifies the rpc.call_instance_start - the master side - to take optional hv/be parameters. The noded side is unchanged and oblivious to the change. This will allow implementation of single-user capability and such on startup (temporary, as opposed to permanent). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 18, 2009
-
-
Guido Trotter authored
Currently QueryJob returns "None" when a wrong job ID is passed. Handle this in gnt-job list, by printing an error for each wrong job, and still giving output for all the jobs which actually do exist. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 15, 2009
-
-
Guido Trotter authored
If the remote info rpc call fails we can't assume that the instance is up. Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
- May 13, 2009
-
-
Guido Trotter authored
Currently doing a gnt-cluster-modify --no-lvm-storage is silently ignored, as it passes a None value in vg_name, which is the same as not modifying that parameter. Explicitely set the passed value to '', so the non-true not-None value can be evaluate to actually remove a volume group. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently LUSetClusterParams will remove the volume group if the vg_name field passed in is not true, but not None. Setting the target volume group to False or the empty string, though, is a bad idea because it's not a boolean value, and at cluster init we set it to None if --no-lvm-storage is passed. With this fix we handle '' (or any other non-None false value) as the "unset" value, but actually store None in the config. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Even if we cannot modify all of them, they are useful information about the current cluster. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Some fields can be set at cluster init, and perhaps even modifed with SetClusterParams but there's no way to know them. With this patch we export them in the cluster info query. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 12, 2009
-
-
Guido Trotter authored
Currently the KVM hypervisor returns strings for the memory and cpu values, while the xen hypervisor returns integers. Making this uniform converting the values to integers in KVM as well. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
LUSetInstanceParam currently assumes that the 'memory' value of a call_instance_info result is an integer, while the rest of the code explicitely converts it to int(). Converting it to int works around a bug which prevents changing the memory allocation of a live instance if the remote call returns the memory in string format. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 11, 2009
-
-
Iustin Pop authored
These were forgotten in commit 01e2ce3a, and caused “make distcheck” to fail. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In Ganeti 1.2, “none” was used to signify no initrd. In 2.0 we have changed to “no_” as a prefix (i.e. “-H no_initrd_path”) and thus we document in the manpage this. The QA suite is changed accordingly. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The _TransformPath function is not used anymore in 2.0, let's remove it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Tim Boring authored
Patch for adding network_port to the instance attributes exported by the RAPI. [iustin@google.com: slightly changed the formatting] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 09, 2009
-
-
Tim Boring authored
Minor patch to clarify the URL necessary for accessing the RAPI. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
The version is 2.0, and we don't build PDFs by default, only HTML files. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 07, 2009
-
-
Carlos Valiente authored
This is for Python 2.6 compatibility. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-