- 11 Apr, 2012 3 commits
-
-
Iustin Pop authored
Sorry, didn't catch this before… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 54b010ca ) Signed-off-by:
Michael Hanselmann <hansmi@google.com>
-
Dimitris Aragiorgis authored
Commit 3b3b1bca does not entirely fix the bug introduced in commit f396ad8c . It fixes consistency of config data in permanent storage, but does not ensure consistency in data held in runtime memory of masterd. The bug of duplicate ports is still triggered when LUInstanceRemove() invokes _RemoveDisks() and this returns False (in case call_blockdev_remove RPC fails). The drbd ports get returned in the pool, but execution is aborted and RemoveInstance() is never invoked. Due to the fact that port handling is not done with TemporaryReservationManager, ensure that ports are released, only if disk related config data is deleted. In _RemoveDisks() release ports only if all RPCs succeed. Extend _RemoveDisks() to include ignore_failures argument passed by _RemoveInstance() to handle the ports appropriately. Signed-off-by:
Dimitris Aragiorgis <dimara@grnet.gr> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Dimitris Aragiorgis authored
Commit f396ad8c returns the TCP port used by DRBD disk back to the TCP/UDP port pool using AddTcpUdpPort(). However, AddTcpUdpPort() writes the config on every invocation, using _WriteConfig(). This causes two problems: * it causes critical errors logged by VerifyConfig(), after the DRBD disk removal, and until the actual instance removal. * if the code following AddTcpUdpPort() fails, the port is already returned back the pool, which causes the port to have duplicates (inconsistent config). AddTcpUdpPort() is invoked in three cases: * during InstanceRemove() through _RemoveDisks(). * during InstanceSetParams() in case of disk removal. * during InstanceSetParams() through _ConvertDrbdToPlain(). This commit fixes the problem by removing the _WriteConfig() call from AddTcpUdpPort(), delegate it to Update() via the TemporaryReservationManager and ensure AddTcpUdpPort() precedes Update(). Signed-off-by:
Dimitris Aragiorgis <dimara@grnet.gr> [iustin@google.com: small comments adjustements] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 3b3b1bca)
-
- 24 Nov, 2011 1 commit
-
-
Michael Hanselmann authored
Note: This bug only manifests itself in Ganeti 2.5, but since the problematic code also exists in 2.4, I decided to fix it there. If a node was assigned to a new group using “gnt-group assign-nodes” the node object's group would be changed, but not the duplicate member list in the group object. The latter is an optimization to require fewer locks for other operations. The per-group member list is only kept in memory and not written to disk. Ganeti 2.5 starts to make use of the data kept in the per-group member list and consequently fails when it is out of date. The following commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was confirmed using additional logging): $ gnt-group add foo $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name) $ gnt-cluster verify # Fails with KeyError This patch moves the code modifying node and group objects into “config.ConfigWriter” to do the complete operation under the config lock, and also to avoid making use of side-effects of modifying objects without calling “ConfigWriter.Update”. A unittest is included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 14 Nov, 2011 1 commit
-
-
Vangelis Koukis authored
Ensure ports previously allocated by calling ConfigWriter's AllocatePort() are returned to the pool of free ports when no longer needed: * Return the network_port of an instance when it is removed * Return the port used by a DRBD-based disk when it is removed Signed-off-by:
Vangelis Koukis <vkoukis@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 23 Aug, 2011 1 commit
-
-
René Nussbaumer authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit b7d7876b) Conflicts: lib/cmdlib.py (easily fixed)
-
- 22 Jul, 2011 1 commit
-
-
Michael Hanselmann authored
Commit 84d7e26b changed “objects.Instance.MapLVsByN” to not just return the LV name, but to include the volume group name (e.g. “xenvg/d67e8700….disk0_data”). This in turn broke the mapping of volume names in LUNodeQueryvols, stopping instance names from displayed in “gnt-node volumes”. This patch fixes the issue and does some cleanup. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 28 Jun, 2011 2 commits
-
-
Iustin Pop authored
The new functionality in 2.4.2 for recreate-disks to change nodes is broken for DRBD instances: it simply changes the nodes without caring for the DRBD minors mapping, which will lead to conflicts in non-empty clusters. This patch changes Exec() method of this LU significantly, to both fix the DRBD minor usage and make sure that we don't have partial modification to the instance objects: - the first half of the method makes all the checks and computes the needed configuration changes - the second half then performs the configuration changes and recreates the disks This way, instances will either be fully modified or not at all; whether the disks are successfully recreate is another point, but at least we'll have the configuration sane. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Patch db8e5f1c removed the use of feedback_fn, hence pylint warn now. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 27 Jun, 2011 1 commit
-
-
Iustin Pop authored
Currently the drbd8 replace-disks on the same node (i.e. -p or -s) has a bug in that it does modify the instance disk temporarily before changing it back to the same value. However, we don't need to, and shouldn't do that: what this operation do is simply change the LVM configuration on the node, but otherwise the instance disks keep the same configuration as before. In the current code, this change back-and-forth is fine *unless* we fail during attaching the new LVs to DRBD; in which case, we're left with a half-modified disk, which is entirely wrong. So we change the code in two ways: - use temporary copies of the disk children in the old_lvs var - stop updating disk.children Which means that the instance should not be modified anymore (except maybe for SetDiskID, which is a legacy and unfortunate decision that will have to cleaned up sometime). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 23 Jun, 2011 1 commit
-
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 17 Jun, 2011 4 commits
-
-
Guido Trotter authored
This was left out during the fix/refactoring Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Guido Trotter authored
- Move the calculation at the beginning of CheckPrereq, since it doesn't modify any state, but still keeps locks - Only perform the calculation if the actual disk template is filebased - Error out if there is no defined file storage dir - Only join the optional --file-storage-dir extra-path if one is passed Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Guido Trotter authored
As the manpage says, and the code does, self.op.file_storage_dir is an additional relative path under the cluster file storage dir. As such it should not be absolute. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 26 May, 2011 1 commit
-
-
Michael Hanselmann authored
Commit 1bee66f3 added assertions for ensuring only the necessary locks are kept while replacing disks. One of them makes sure locks have been released during the operation. Unfortunately the commit added the check as part of a “finally” branch, which is also run when an exception is thrown (in which case the locks may not have been released yet). Errors could be masked by the assertion error. Moving the check out of the “finally” branch fixes the issue. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 24 May, 2011 1 commit
-
-
Iustin Pop authored
Currently we generate an empty list only for the '-n node' invocation, but for iallocator we still call the iallocator (which needs an RPC call, etc.). By moving the computation of instances outside of the if block, we can return early from the LU. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 16 May, 2011 1 commit
-
-
Iustin Pop authored
This will allow stopping or starting an instance without changing the remembered state. While this seems counter-intuitive at first (it will create cluster verify errors), it can help in a few corner cases: - shutting down an entire cluster for maintenance but without having to remember state - doing testing of Ganeti itself Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 11 May, 2011 1 commit
-
-
Iustin Pop authored
There are multiple bugs with the code checking for N+1 failures in the instance memory changes which needs significant changes, in the meantime we can at least: - change the warning message into an error (--force will skip checks) - only make checks when we increase the memory Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 09 May, 2011 2 commits
-
-
Iustin Pop authored
Currently, when converting an instance from plain to DRBD, the instance is blocked during the entire resync period. This patch adds the --no-wait-for-sync so that the operation finishes as soon as the DRBD sync has started, without waiting for the entire sync. This makes the instance available much faster. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch introduces the option of changing an instance's nodes when doing the disk recreation. The rationale is that currently if an instance lives on a node that has gone down and is marked offline, it's not possible to re-create the disks and reinstall the instance on a different node without hacking the config file. Additionally, the LU now locks the instance's nodes (which was not done before), as we most likely allocate new resources on them. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 06 May, 2011 2 commits
-
-
Iustin Pop authored
It makes not sense to show messages like: Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to 'instance18' So we'll skip the message if the resolved name is identical to the requested one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
The original code would get all node information and their groups without before acquiring the necessary locks. With this patch the node information is only retrieved once all locks have been acquired. Groups are locked optimistically and verified after acquiring the node locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 03 May, 2011 2 commits
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This removes (count of instances + count of nodes) lock acquires/releases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 02 May, 2011 2 commits
-
-
Iustin Pop authored
At least one generates an epydoc error :) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently cluster verify doesn't check for bridge information; the only checks are done at instance create and failover/migrate time. This means a cluster that seems healthy will fail creation jobs. This patch implements a simple verification that all nodes (in the entire cluster, so doesn't work well for multi-group) have all the required bridges: the default one plus any instance bridge. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 29 Apr, 2011 2 commits
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If an iallocator is used, “gnt-instance replace-disks” would acquire the locks of all nodes (only the allocator will decide which node to use). Unfortunately the unneeded locks were not released during the operation, causing unnecessary delays for other jobs. This patch changes the LU to release unneeded locks and adds assertions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 28 Apr, 2011 2 commits
-
-
Iustin Pop authored
This is a simple change to allow specifying a different VG for the meta device during the creation of instances and addition of disks via gnt-instance modify. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a small change to make this function take a list of VG names, instead of a single one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 27 Apr, 2011 5 commits
-
-
Iustin Pop authored
This patch enhances the multi-VG support in replace disks, by keeping the meta device in the same VG, as opposed to moving it to the data device VG (note that we don't have a way to create the meta in a different VG in the first place, but at least we correctly handle a custom config). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Doug Dumitru authored
Converting an instance from 'plain' to 'drbd'. The old code would create the drbd volumes in the default VG and then the renames would fail. This fix pulls the plain VG names from the existing volumes and places it into the new disk template. Running 'replace-disks' has a similar issue with the new disks going into the wrong VG and then the rename failing. Their might be a similar issue with 'recreate-disks', but I actually have no idea what recreate-disks does, so did not look into it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
A few issues in the clarity of the error messages are fixed: - "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image" - "API version lenny-image differs from reference node node1: 10, 5 vs. 10, 20, 5, 15": parameters not sorted in display - "OS variants list lenny-image differs from reference node node1: vs. default, i386": empty sets are not clearly delimited, changed to add [] around the sets: "node node1: [] vs. [default, i386]" - "OS parameters lenny-image differs from reference node node1: vs. (u'dhcp', u'Whether to enable (yes) or disable (dhcp)')": ugly formatting in the OS parameters list, as we used to just "%s" the tuple; now it is "reference node node1: [] vs. [dhcp: Whether to enable (yes) or disable (dhcp)]" Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This breaks Ganeti in multiple ways. If we don't make the check in gnt-node itself, then bootstrap.SetupNodeDaemon will restart the master daemon, making the operation fail: node1# gnt-node add --readd node1 Cannot communicate with the master daemon. Is it running and listening for connections? The check in cmdlib is more of a safety check, as we shouldn't reach it. If we do (via a bad client), then it will prevent breakage in the job queue/config handling. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
IIRC we don't use punctuation at the end of error messages. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 20 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 19 Apr, 2011 1 commit
-
-
Iustin Pop authored
The current wipe_chunk_size computation is doing min(int_value, float_value). For small disks (below 10GiB), the actual formula will result into the float value being chosen. This results into very interesting behaviour: Wiping disk 0, offset 102.4, chunk 102.4 Wiping disk 0, offset 204.8, chunk 102.4 … Wiping disk 0, offset 921.6, chunk 102.4 Wiping disk 0, offset 1024.0, chunk 1.13686837722e-13 Since these are passed to dd via %d, this will result into the call to dd specifying offset 1024 and count 0, which will fail. We just need to enforce conversion to int, in order to not get bitten by floating point rounding errors. The patch also reorders some logging messages in order to log the chunk size. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 14 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
Ganeti 2.3 introduced an optional feature to overwrite an instance's disks on creation. Unfortunately the code kept all locks while doing the wipe, slowing down the creation of multiple instances in parallel. This patch changes the code to wipe the disks only after releasing the locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 13 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
Before this patc the message would look like “Some groups do not exist: [u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-