- May 03, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This removes (count of instances + count of nodes) lock acquires/releases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 02, 2011
-
-
Iustin Pop authored
At least one generates an epydoc error :) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
With the current code, it's possible to mistake a ^C for a protocol error: node1# gnt-job info 221691 [press ^C] Unhandled protocol error while talking to the master daemon: Error while deserializing response: (and note empty error message). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This handles EPIPE errors in two places: ToStream (to catch logging done in GenericMain itself) and in GenericMain (to cover also plain print statements). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently cluster verify doesn't check for bridge information; the only checks are done at instance create and failover/migrate time. This means a cluster that seems healthy will fail creation jobs. This patch implements a simple verification that all nodes (in the entire cluster, so doesn't work well for multi-group) have all the required bridges: the default one plus any instance bridge. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 29, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If an iallocator is used, “gnt-instance replace-disks” would acquire the locks of all nodes (only the allocator will decide which node to use). Unfortunately the unneeded locks were not released during the operation, causing unnecessary delays for other jobs. This patch changes the LU to release unneeded locks and adds assertions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This is analog to “is_owned” and will be used for assertions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The iallocator parameter is “-I”, not “-i”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This allows noded to continue instead of blowing up if the libc major number changes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 28, 2011
-
-
Iustin Pop authored
This is a simple change to allow specifying a different VG for the meta device during the creation of instances and addition of disks via gnt-instance modify. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a small change to make this function take a list of VG names, instead of a single one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Unicode is fun, indeed: >>> len(buffer("abc")) 3 >>> len(buffer(u"abc")) 12 So we can't pass unicode data to buffer(), as the result will be to write the in-memory (usually UTF-32) representation to disk. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 27, 2011
-
-
Iustin Pop authored
This patch enhances the multi-VG support in replace disks, by keeping the meta device in the same VG, as opposed to moving it to the data device VG (note that we don't have a way to create the meta in a different VG in the first place, but at least we correctly handle a custom config). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Doug Dumitru authored
Converting an instance from 'plain' to 'drbd'. The old code would create the drbd volumes in the default VG and then the renames would fail. This fix pulls the plain VG names from the existing volumes and places it into the new disk template. Running 'replace-disks' has a similar issue with the new disks going into the wrong VG and then the rename failing. Their might be a similar issue with 'recreate-disks', but I actually have no idea what recreate-disks does, so did not look into it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
os.write can do incomplete writes, as long as at least some bytes have been written (like write(2)): >>> os.write(fd, " " * 1300) 1300 >>> os.write(fd, " " * 1300) 1300 >>> os.write(fd, " " * 1300) 1300 >>> os.write(fd, " " * 1300) 980 >>> os.write(fd, " " * 1300) Traceback (most recent call last): File "<stdin>", line 1, in ? OSError: [Errno 28] No space left on device Note that incomplete write that only wrote 980 bytes, before the exception. To workaround this, we simply iterate until all data is written. Unittests could be written by using a parameter instead of hardcoding os.write and checking for incomplete writes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
A few issues in the clarity of the error messages are fixed: - "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image" - "API version lenny-image differs from reference node node1: 10, 5 vs. 10, 20, 5, 15": parameters not sorted in display - "OS variants list lenny-image differs from reference node node1: vs. default, i386": empty sets are not clearly delimited, changed to add [] around the sets: "node node1: [] vs. [default, i386]" - "OS parameters lenny-image differs from reference node node1: vs. (u'dhcp', u'Whether to enable (yes) or disable (dhcp)')": ugly formatting in the OS parameters list, as we used to just "%s" the tuple; now it is "reference node node1: [] vs. [dhcp: Whether to enable (yes) or disable (dhcp)]" Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This breaks Ganeti in multiple ways. If we don't make the check in gnt-node itself, then bootstrap.SetupNodeDaemon will restart the master daemon, making the operation fail: node1# gnt-node add --readd node1 Cannot communicate with the master daemon. Is it running and listening for connections? The check in cmdlib is more of a safety check, as we shouldn't reach it. If we do (via a bad client), then it will prevent breakage in the job queue/config handling. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
IIRC we don't use punctuation at the end of error messages. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 21, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 20, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Commit dae661a4 added support for controlling the locking, but it didn't modify the gnt-instance info code, which leads to this command always showing: Wed Apr 20 04:10:48 2011 - WARNING: Non-static data requested, locks need to be acquired We simply change gnt-instance to request locks whenever we don't use the static mode. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 19, 2011
-
-
Iustin Pop authored
Thanks to net.for.hub@gmail.com for reporting this. The logic in masterd.CheckMasterd did an early return in case of no_voting, hence skipping the master IP activation. We just change the ifs to not return but simply continue through the function. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The current wipe_chunk_size computation is doing min(int_value, float_value). For small disks (below 10GiB), the actual formula will result into the float value being chosen. This results into very interesting behaviour: Wiping disk 0, offset 102.4, chunk 102.4 Wiping disk 0, offset 204.8, chunk 102.4 … Wiping disk 0, offset 921.6, chunk 102.4 Wiping disk 0, offset 1024.0, chunk 1.13686837722e-13 Since these are passed to dd via %d, this will result into the call to dd specifying offset 1024 and count 0, which will fail. We just need to enforce conversion to int, in order to not get bitten by floating point rounding errors. The patch also reorders some logging messages in order to log the chunk size. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
If “utils.RunParts” were to raise an exception, a log message was written and the code continued to run. Due to the exception the “results” variable would not be defined. Also change the code to log a backtrace (getting an exception is rather unlikely and having a backtrace is useful) and update one comment. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Apr 14, 2011
-
-
Michael Hanselmann authored
Ganeti 2.3 introduced an optional feature to overwrite an instance's disks on creation. Unfortunately the code kept all locks while doing the wipe, slowing down the creation of multiple instances in parallel. This patch changes the code to wipe the disks only after releasing the locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 13, 2011
-
-
Michael Hanselmann authored
Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154 ) reported an “Operation not supported” error when writing instance exports to a mounted CIFS filesystem. Experimentation showed the error to only occur when using rename(2) on an opened file. Various references on the web confirmed this observation. Whether or not the problem occurs can also depend on the CIFS server implementation. In issue 154 it was Windows 2008 R2. While not solving all cases, closing the file before renaming helps alleviating the issue a bit. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
README is not copied to the build tree. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Before this patc the message would look like “Some groups do not exist: [u'foo', u'bar']”, now it's “Some groups do not exist: foo, bar”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 08, 2011
-
-
Michael Hanselmann authored
Also add a check to Makefile's check-local target. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 07, 2011
-
-
Michael Hanselmann authored
* stable-2.4: Add error checking and merging for cluster params Clarify --force-join parameter message Treat empty oob_program param as default Fix bug in instance listing with orphan instances Fix bug related to log opening failures Bump version for 2.4.1 release cfgupgrade: Fix critical bug overwriting RAPI users file Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 06, 2011
-
-
Michael Hanselmann authored
Until now LUInstanceQueryData always acquired locks for the instance(s) and nodes involved. In combination with long-running operations this prevented the use of “gnt-instance info”, even with the “--static” option. With this patch, locks are only acquired when explicitely requested in the opcode (like all query operations). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This has been observed to cause problems on real clusters via the following mechanism: - a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance - the watcher starts and submits its query instances opcode which wants shared locks for all instances - after about an hour, the watcher job falls back to blocking acquire, after having acquired all other locks - any instance opcode that wants an exclusive lock for an instance cannot start until the watcher has finished, even though there's no actual operation on that instance In order to alleviate this problem, we simply increase the max timeout until lock acquires are sent back to either blocking acquire or priority increase. The timeout is computed such that we wait ~10 hours (instead of one) for this to happen, which should be within the maximum lifetime of a reasonable opcode on a healthy cluster. The timeout also means that priority increases will happen every half hour. We also increase the max wait interval to 15 seconds, otherwise we'd have too many retries with the increased interval. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 04, 2011
-
-
Iustin Pop authored
Before this, the output in the rapi daemon log was: 2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users file at /var/lib/ganeti/rapi/users 2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon startup Which is confusing, as it might look like the read of the users file is part of the previous run. This is because we log the 'daemon startup' message after the prepare_fn, which can log things on its own. The patch simply moves the 'daemon startup' message just before prepare_fn call. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This changes the display from: Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancy Mon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory to accomodate instance failovers should node node1 fail To: Mon Apr 4 02:32:50 2011 * Verifying N+1 Memory redundancy Mon Apr 4 02:32:50 2011 - ERROR: node node2: not enough memory to accomodate instance failovers should node node1 fail (33536MiB needed, 27910MiB available) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Mar 31, 2011
-
-
Iustin Pop authored
This is not needed for this function, and can interfere with debugging of ssh failures. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-