- Jul 07, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If a user used ^Z to stop the program, poll() in socket.recv would return EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 01, 2009
-
-
Iustin Pop authored
With the change to stripped LVs, the actual size of a meta device (which is small) can be more than we expected (for non-stripped LVs). This patch increases from 160MB to 1GB the accepted size, and updates the comment with the rationale behind this change. Note that we do want even meta devices stripped, since it can increase metadata update. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Jun 30, 2009
-
-
Iustin Pop authored
Currently, when draining nodes we reset their master candidate flag, but we don't instruct them to demote themselves. This leads to “ERROR: file '/var/lib/ganeti/config.data' should not exist on non master candidates (and the file is outdated)”. This patch simply adds a call to node_demote_from_mc in this case. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch fixes a few node readd issues. Currently, the node readd consists of two opcodes: - OpSetNodeParms, which resets the offline/drained flags - OpAddNode (with readd=True), which reconfigures the node The problem is that between these two, the configuration is inconsistent for certain cluster configurations. Thus, this patch removes the first opcode and modified the LUAddNode to deal with this case too. The patch also modifies the computation of the intended master_candidate status, and actually sets the readded node to master candidate if needed. Previously, we didn't modify the existing node at all. Finally, the patch modifies the bottom of the Exec() function for this LU to: - trigger a node update, which in turn redistributes the ssconf files to all nodes (and thus the new node too) - if the new node is not a master candidate, then call the node_demote_from_mc RPC so that old master files are cleared My testing shows this behaves correctly for various cases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
If the config file is missing when the DemoteFromMC() function is called, it will raise a ProgrammerError. Instead of changing the utils.CreateBackup() file which is called from multiple places, for now we only change the DemoteFromMC() function to not call it if the file is not existing (we rely on the master to prevent race conditions here). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to ignore some nodes in the calculation, so that we can use it to predict cluster state without some nodes (which we know we will modify, and thus we should not rely on their state). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Currently the message for extraneous files on non master candidates is confusing, to say the least. This makes it hopefully more clear. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Jun 29, 2009
-
-
Iustin Pop authored
The code for adjusting the candidate pool size was done after the config update, and this means we triggered the save of the config file without fixing the candidate pool, which aborts with an error. The patch just moves it above. The old comment was valid, but we anyway save the config file in MaintainCandidatePool, so this should be safe. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds a ‘role’ node list field, which shows a one-character node status. This is a simpler way to see the node status than selecting all the flags individually. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 23, 2009
-
-
Iustin Pop authored
Currently the http library only checks credentials when authentication is required. This means that any credentials are accepted on the root resource, for example, which makes problems hard to diagnose - the user/pw works for all queries, until one tries to do a modification at which point fails. This patch changes the PreHandleRequest() function to not ignore credentials when passed, even if we don't require authentication. This makes the behavior of RAPI more predictable. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The documentation for the reboot was wrong. This patch fixes it and updates the docstring with more details. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 17, 2009
-
-
Iustin Pop authored
Currently running “gnt-instance list -o+vcpus” fails with a cryptic message: Unhandled Ganeti error: vcpus This is due to multiple issues: - in some corner cases cmdlib.py raises an errors.ParameterError but this is not handled by cli.py - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle it, so instead of failing with unknown parameter, e.g.: Failure: prerequisites not met for this operation: Unknown output fields selected: vcpuscd it raises the ParameteError message This patch: - adds handling of 'vcpus' to LUQueryInstances - adds handling of the ParameterError exception to cli.py - changes the 'else: raise errors.ParameterError' in the field handling of LUQueryInstance to an assert, since it's a programmer error if we reached this step With this, a future unhandled parameter will show: gnt-instance list -o+vcpus Unhandled protocol error while talking to the master daemon: Caught exception: Declared but unhandled parameter 'vcpus' Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The current check in LUCreateInstance.CheckPrereq() is wrong - it only checks if we got an OS, but not if we got a valid OS. This patch fixes it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The size of the instance's disk was not shown in “gnt-instance info”. This patch adds it and formats it nicely if possible. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 16, 2009
-
-
Guido Trotter authored
It was mistakenly called --backend Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently we support querying for "mac" "ip" or "bridge", meaning "the one of the first nic. We are not checking that there is a first nic, though, and thus could incur in errors. This patch fixes it by returning "None" should there be no such nic, as it's done when explicitely asking for a nic via nic.<field>/<N> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
* master: Update NEWS and version for 2.0.1 release gnt-{instance,backup}(8) --nic is actually --net Fix a wrong function name in backend.DrbdAttachNet GNT-CLUSTER(8) fix search-tags example
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Fix a typo in the man pages that used the wrong option name. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 15, 2009
-
-
Iustin Pop authored
Commit cf8df3f3 "bdev: forward-port ReAttachNet/DisconnectNet" forward-ported 1.2's bdev.DRBD8.ReAttachNet() to 2.0 while renaming it to AttachNet(), but commit 6b93ec9d "Forward-port DrbdNetReconfig" didn't rename all the calls to it and left one ReAttachNet call in backend.py. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 11, 2009
-
-
Guido Trotter authored
Reported in issue 59. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 08, 2009
-
-
Iustin Pop authored
This patch enables stripped LVs, falling back to non-stripped if the stripped creation fails. If the configure-time lvm-stripecount is 1, this patch becomes a noop (with an insignificant python-level overhead, but no extra lvm calls). The effect of this patch is that new instances will get stripped LVs from the start, whereas old instances will have their LVs stripped as soon as replace-disks is run for them. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds a configure-time customizable parameter that will be used to enable stripped LVs. The default of the parameter is 3. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds constants for the connection status, peer roles and disk status, and it changes the rules for when the disk is considered as “resyncing” - previously it was only for syncsource/synctarget, but there are many other transient statuses which could be misinterpreted as ‘degraded’ (because they where not considered as resyncing, but the disk is not consistent in these statuses). Furthermore, cmdlib.py:WaitForSync determines if a device is syncing or not based on sync_percent being not none. Not all DRBD resync statuses offer a percent done, so if we are syncing but don't have a sync percent, we'll report a zero sync percent (and no time estimate). The patch also removes a few unused variables (is_sync_target, peer_sync_target, is_resync) whose value doesn't make sense anymore with the new sync rules. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
* master: Wait for a while in failed resyncs Fix two issues with exports and snapshot errors
-
- Jun 04, 2009
-
-
Iustin Pop authored
This patch is an attempt at fixing some very rare occurrences of messages like: - "There are some degraded disks for this instance", or: - "Cannot resync disks on node node3.example.com: [True, 100]" What I believe happens is that drbd has finished syncing, but not all fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or in some other transient state we don't handle well. The patch will change the _WaitForSync method to recheck up to a hardcoded number of times if we're finished syncing but we're degraded (using the same condition as the 'break' clause of the loop). The cons of this changes is that a normal, really-degraded due to network or disk failure will cause an extra delay before it aborts. For this, I'm happy to choose other values. A better, long term fix is to handle more DRBD state correctly (see the bdev.DRBD8Status class). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 03, 2009
-
-
Iustin Pop authored
This patch changes DRBD disk attachment to force the wanted size, as opposed to letting the device auto-discover its size. This should make the disks more resilient with regard to small differences in size (e.g. due to LVM rounding). This still works with regard to disk growth, but the instances needs to be fully restarted (including disks) in that case. This passes a full burning without problems, but it's still a tricky change - if the config.data is not synced with the reality, we might tell DRBD a wrong size. At least this will fail outright (and not introduce silent errors), as DRBD (per a quick check at the sources) tracks the size in the meta-dev and also does not allow shrinking consistent devices. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch fixes two issues related to failed snapshots during exports: - first, the error messages used disk.logical_id[1], which is a node name for DRBD, and it resulted in strange error messages like "cannot snapshot block device node1 on node2" - second, if snapshotting fails for any disk, rpc.call_finalize_export fails as it didn't handle booleans (backend.FinalizeExport does) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 28, 2009
-
-
Iustin Pop authored
Currently the code in cmdlib doesn't set the device size to new DRBD devices in replace secondary, but we need to do it otherwise it gets initialized to None. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch changes all the bdev.BlockDev constructors to take an additional ‘size’ parameter, all the backend functions that call those functions to pass it and also changes backend.BlocdevCreate() to not use the size passed via the rpc call but instead directly disk.size (this is the only way it's called). Note that this patch doesn't do anything with this parameter, just stores it on the blockdev objects. With the patch, we actually have a more uniform init sequence (before create had the parameter, but the other functions not). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
* next: (34 commits) watcher: automatically restart noded/rapi watcher: handle full and drained queue cases rapi: rework error handling Fix backend.OSEnvironment be/hv parameters rapi: make tags query not use jobs Change failover instance when instance is stopped Export more instance information in hooks watcher: write the instance status to a file Fix the SafeEncoding behaviour Move more hypervisor strings into constants Add -H/-B startup parameters to gnt-instance call_instance_start: add optional hv/be parameters Fix gnt-job list argument handling Instance reinstall: don't mix up errors Don't check memory at startup if instance is up gnt-cluster modify: fix --no-lvm-storage LUSetClusterParams: improve volume group removal gnt-cluster info: show more cluster parameters LUQueryClusterInfo: return a few more fields Add the new DRBD test files to the Makefile ... Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 27, 2009
-
-
Iustin Pop authored
This is simply a version bump, no changes from rc5. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 25, 2009
-
-
Iustin Pop authored
This patch makes the watcher automatically restart the node and rapi daemons, if they are not running (as per the PID file). This is not an exhaustive test; a better one would be TCP connect to the port, and an even better one a simple protocol ping (e.g. get / for rapi and a rpc_call_alive for noded), but since we don't know how they've been started we can't implement it today. rapi would need to write the SSL/port to a file, and noded something similar, so that we know how to connect. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the watcher is broken when the queue is full, thus not fulfilling its job as a queue cleaner. It also doesn't handle nicely the queue drained status. This patch does a few changes: - first archive jobs, and only after submit jobs; this fixes the case where the queue is already full and there are jobs suited for archiving (but not the case where the jobs all too young to be archived) - handle nicely the job queue full and drained cases—instead of tracebacks, log such cases nicely - reverse the initial value and special cases for update_file; we now whitelist instead of blacklist cases, since we have much more blacklist cases than vice versa, and we set the flag to True only after the run is successful The last change, especially, is a significant one: now errors during the watcher run will not update the status file, and thus they won't be lost again in the logs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently the rapi code doesn't have any custom error handling; any exceptions raised are simply converted into an HTTP 500 error, without much explanation. This patch adds a couple of generic SubmitJob/GetClient functions that handle some errors specially so that they are transformed into HTTP errors, with more detailed information. With this patch, the behaviour of rapi when the queue is full or drained, or when the master is down is more readable. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Commit 67fc3042 added some more variables to be exported to OSEnvironment, but it has two bugs: - wrong variable name (env vs. result) - in OSEnvironment we don't have the automatic converstion to strings that we do in hooks, so we must manually enforce this With this patch instance creations work again. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-