- 10 May, 2011 2 commits
-
-
Michael Hanselmann authored
The “acquired_locks” attribute in LUs is used to keep a list of acquired locks at each lock level. This information is already known in the lock manager, which also happens to be the authoritative source. Removing the attribute and directly talking to the lock manager saves us from having to maintain the duplicate information when releasing locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Saves some typing and we'll use it more often in the future. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 06 May, 2011 1 commit
-
-
Iustin Pop authored
Commit 1c6e5787 removed the iallocator and target_node keyword parameters from TLMigrateInstance, but I didn't update their use in LUInstanceFailover and (not fully) in LUInstanceMigrate. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 05 May, 2011 3 commits
-
-
Apollon Oikonomopoulos authored
Commit faaabe3c fixed failover behaviour for DTS_INT_MIRROR instances, however it broke migration for DTS_EXT_MIRROR instances, by moving iallocator and node checks from LUInstanceMigrate to TLMigrateInstance. This has the side-effect that the LU called the TL with None for both, node and iallocator when the default iallocator was being used. This patch maintains the iallocator checks in TLMigrateInstance and fixes the LU-TL integration. Signed-off-by:
Iustin Pop <iustin@google.com> [iustin@google.com: rebased patch on current HEAD] Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This is one of the first opcodes to make use of node group locking. To get an instance's node groups, the instance's nodes need to be looked at. Due to a previous design decision nodes are locked after the group, hence there's no clean locking order. This patch works around that by first getting the instance's groups without locks, and then verifying them after actually getting all locks. Rough overview: - Lock instance - Get groups of instance's nodes - Lock groups - Lock all nodes in groups - Verify node groups - Run iallocator - Release group and unused nodes - Replace disks, etc. There are probably too many assertions in the code, but it's locking and we've been bitten in the past. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Apollon Oikonomopoulos authored
Commit 77fcff4a unintentionally incorporated code from TLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator, presumably during a rebase from earlier versions of the patch to the 2.5 codebase. As a result all migrations running without an iallocator were performed non-live :-( This patch moves the affected code back to CheckPrereq. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 May, 2011 1 commit
-
-
Michael Hanselmann authored
- Clarify some error messages - Remove unnecessary punctuation - Merge two if conditions in one place Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 29 Apr, 2011 4 commits
-
-
Michael Hanselmann authored
There will be more lock releasing with upcoming changes, so this will centralize the logic behind it (what locks to keep, which variables to update, etc.). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If an iallocator is used, “gnt-instance replace-disks” would acquire the locks of all nodes (only the allocator will decide which node to use). Unfortunately the unneeded locks were not released during the operation, causing unnecessary delays for other jobs. This patch changes the LU to release unneeded locks and adds assertions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
It is no longer used. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 28 Apr, 2011 6 commits
-
-
Adeodato Simo authored
Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Commit d5cafd31 changed this error message, swapping the text parts in the process. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also add an assertion. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Quoting from iallocator.rst: “[…] ``relocate`` request is used when an existing instance needs to be moved within its node group […]”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This is a simple change to allow specifying a different VG for the meta device during the creation of instances and addition of disks via gnt-instance modify. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a small change to make this function take a list of VG names, instead of a single one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 27 Apr, 2011 5 commits
-
-
Iustin Pop authored
This patch enhances the multi-VG support in replace disks, by keeping the meta device in the same VG, as opposed to moving it to the data device VG (note that we don't have a way to create the meta in a different VG in the first place, but at least we correctly handle a custom config). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Doug Dumitru authored
Converting an instance from 'plain' to 'drbd'. The old code would create the drbd volumes in the default VG and then the renames would fail. This fix pulls the plain VG names from the existing volumes and places it into the new disk template. Running 'replace-disks' has a similar issue with the new disks going into the wrong VG and then the rename failing. Their might be a similar issue with 'recreate-disks', but I actually have no idea what recreate-disks does, so did not look into it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
A few issues in the clarity of the error messages are fixed: - "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image" - "API version lenny-image differs from reference node node1: 10, 5 vs. 10, 20, 5, 15": parameters not sorted in display - "OS variants list lenny-image differs from reference node node1: vs. default, i386": empty sets are not clearly delimited, changed to add [] around the sets: "node node1: [] vs. [default, i386]" - "OS parameters lenny-image differs from reference node node1: vs. (u'dhcp', u'Whether to enable (yes) or disable (dhcp)')": ugly formatting in the OS parameters list, as we used to just "%s" the tuple; now it is "reference node node1: [] vs. [dhcp: Whether to enable (yes) or disable (dhcp)]" Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This breaks Ganeti in multiple ways. If we don't make the check in gnt-node itself, then bootstrap.SetupNodeDaemon will restart the master daemon, making the operation fail: node1# gnt-node add --readd node1 Cannot communicate with the master daemon. Is it running and listening for connections? The check in cmdlib is more of a safety check, as we shouldn't reach it. If we do (via a bad client), then it will prevent breakage in the job queue/config handling. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
IIRC we don't use punctuation at the end of error messages. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 21 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 20 Apr, 2011 2 commits
-
-
Apollon Oikonomopoulos authored
TLMigrateInstance._ExecMigration contains two 10-second sleeps between individual migration steps. Apart from prolonging the migration duration by 20s, the second sleep causes FinalizeMigration to be called 10 seconds after the real migration completion; since FinalizeMigration is used for configuring KVM network interfaces of “incoming” instances, this incurs a 10-to-12-second-long network downtime for migrated instances. This patch removes both calls. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 19 Apr, 2011 1 commit
-
-
Iustin Pop authored
The current wipe_chunk_size computation is doing min(int_value, float_value). For small disks (below 10GiB), the actual formula will result into the float value being chosen. This results into very interesting behaviour: Wiping disk 0, offset 102.4, chunk 102.4 Wiping disk 0, offset 204.8, chunk 102.4 … Wiping disk 0, offset 921.6, chunk 102.4 Wiping disk 0, offset 1024.0, chunk 1.13686837722e-13 Since these are passed to dd via %d, this will result into the call to dd specifying offset 1024 and count 0, which will fail. We just need to enforce conversion to int, in order to not get bitten by floating point rounding errors. The patch also reorders some logging messages in order to log the chunk size. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 14 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
Ganeti 2.3 introduced an optional feature to overwrite an instance's disks on creation. Unfortunately the code kept all locks while doing the wipe, slowing down the creation of multiple instances in parallel. This patch changes the code to wipe the disks only after releasing the locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 13 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
Before this patc the message would look like “Some groups do not exist: [u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 06 Apr, 2011 4 commits
-
-
Michael Hanselmann authored
Until now LUInstanceQueryData always acquired locks for the instance(s) and nodes involved. In combination with long-running operations this prevented the use of “gnt-instance info”, even with the “--static” option. With this patch, locks are only acquired when explicitely requested in the opcode (like all query operations). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
As the code for failover for checking is almost identical it's an easy task to switch it over to the TLMigrateInstance. This allows us to fallback to failover if migrate fails prereq check for some reason. Please note that everything from LUInstanceFailover.Exec is taken over unchanged to TLMigrateInstance._ExecFailover, only with adaption to opcode fields and variable referencing, but not in logic. There still needs to go some effort into merging the logic with the migration (for example DRBD handling). But this should happen in a separate iteration. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Until now “gnt-cluster verify” (LUClusterVerify) would compute its own list of files to check for consistency. This list was not complete and certain inconsistencies were missed. With this patch the code is changed to use the list of files used by LUClusterRedistConf. The new check needs to be on a whole-cluster level, and no longer per node. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
… and change the logic in _RedistributeAncillaryFiles. The virtually same list of files will be used to verify the files' consistency. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Apr, 2011 3 commits
-
-
Michael Hanselmann authored
Commit 75c7520f used the wrong constant. I double-checked all other changes made in the commit. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This changes the display from: Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancy Mon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory to accomodate instance failovers should node node1 fail To: Mon Apr 4 02:32:50 2011 * Verifying N+1 Memory redundancy Mon Apr 4 02:32:50 2011 - ERROR: node node2: not enough memory to accomodate instance failovers should node node1 fail (33536MiB needed, 27910MiB available) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 28 Mar, 2011 1 commit
-
-
René Nussbaumer authored
This fixes a issue, where an stopped instances is reported as ERROR in cluster verify if it lives on a offline node. As the instances is down this shouldn't happen. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 25 Mar, 2011 1 commit
-
-
Michael Hanselmann authored
The design details can be seen in the design document (doc/design-lu-generated-jobs.rst). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 18 Mar, 2011 1 commit
-
-
Michael Hanselmann authored
Commit dd7f6776 added another call to BuildHooksEnv to provide post-phase status variables. Since BuildHooksEnv also built the node lists, that meant they have to be built twice. First a rather strict check was used, but it turned out to be more tricky. Commit b423c513 had to remove the strict check again. With this patch the function is split in two parts, one generating the actual environment variables, and another part returning the node lists. The former is called twice. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 17 Mar, 2011 1 commit
-
-
Michael Hanselmann authored
This broke QA (and everyone trying to add a node) by complaining about different node lists. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 16 Mar, 2011 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-