- Apr 06, 2011
-
-
Iustin Pop authored
This has been observed to cause problems on real clusters via the following mechanism: - a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance - the watcher starts and submits its query instances opcode which wants shared locks for all instances - after about an hour, the watcher job falls back to blocking acquire, after having acquired all other locks - any instance opcode that wants an exclusive lock for an instance cannot start until the watcher has finished, even though there's no actual operation on that instance In order to alleviate this problem, we simply increase the max timeout until lock acquires are sent back to either blocking acquire or priority increase. The timeout is computed such that we wait ~10 hours (instead of one) for this to happen, which should be within the maximum lifetime of a reasonable opcode on a healthy cluster. The timeout also means that priority increases will happen every half hour. We also increase the max wait interval to 15 seconds, otherwise we'd have too many retries with the increased interval. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Mar 16, 2011
-
-
Michael Hanselmann authored
In some rare cases it can happen that a lock is re-created very soon after deletion, while the old instance hasn't been destructed yet. In such a case the code would detect a duplicate name and raise an exception. We have seen at least one case where this happened during the creation of many instances. It is not exactly clear how it came to be, but it appears to have occurred while different jobs fought for locks with short timeouts (in the case of instance creation locks are added at this stage and removed shortly after if not all locks can be acquired). The issue is fixed by removing the check for duplicate names. To still guarantee a stable sort order for the lock information as shown by “gnt-debug locks”, a registration number is recorded for each lock in the monitor. A unittest is included to check for the situation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 15, 2011
-
-
Michael Hanselmann authored
The ability to split a string into a list of strings and integers can be handy elsewhere and is necessary for sorting query results by names. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit f47941f8)
-
- Feb 23, 2011
-
-
René Nussbaumer authored
Commit e431074f introduced an uncatched bug. This patch fixes this. The set is expecting a list or iteratable to work on, so it splitted the provided instance name into a set of characters. This caused the exp_status never been set and therefore not catched in one assert rule further below who checks that every status was tested. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 18, 2011
-
-
Iustin Pop authored
And also enable verbose display via the, well, verbose option. Man page and tests are updated, and the formatting is moved from 4 if statements to a data structure. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 17, 2011
-
-
Iustin Pop authored
Since we don't have the data per design, UNAVAIL is appropriate here, while NODATA is not. The patch also adds a comment: if we extend the live fields list to contain other data in the future, we need to reevaluate this solution. This should fix issue 143. The listing now shows (node2==ofline, node3==not vm_capable): Node DTotal DFree MTotal MNode MFree Pinst Sinst node1 698.6G 630.5G 32.0G 1.0G 30.0G 8 7 node2 (offline) (offline) (offline) (offline) (offline) 9 4 node3 (unavail) (unavail) (unavail) (unavail) (unavail) 0 0 Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 02, 2011
-
-
Michael Hanselmann authored
This function can be used from a SIGHUP handler to reopen log files. Initial, simple unittests are included. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 31, 2011
-
-
Michael Hanselmann authored
This patch adds a new log handler class based on the standard library's BaseRotatingHandler. This new class allows the log file to be re-opened, e.g. upon receiving a SIGHUP signal. The latter will be implemented in forthcoming patches. The patch does not change the behaviour regarding writing to /dev/console. Quite a bit of code had to be changed to unittest the log handlers. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 28, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This makes it possible to get the console information via a LUXI query. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
If for some reason (e.g. failed migration) one instance is running on multiple nodes the output can become inconsistent. To get that error and make it consistent between runs we make the call on the secondary too and look if it's running there. If so we report the instance as ERROR_wrongnode. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 27, 2011
-
-
Michael Hanselmann authored
Commit 70b0d2a2 broke unittests on Python 2.4 and 2.5. Turns out that Python 2.6 and above allow classes to be passed as custom test runners, whereas earlier versions don't. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 21, 2011
-
-
René Nussbaumer authored
This patch renames QRFS_* to RS_* fields so they can be used in other places (i.e. LUs) without confusion, as this was initially meant for query operations. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 20, 2011
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 18, 2011
-
-
Iustin Pop authored
While looking at the query library, I realized that while we have five field statuses, making this a 5-dimensional space, four of them are shrunk to a single possible value (None). Hence it should be possible to convert this into a single value space plus extra 4 special constants. This patch implements this, making (IMHO) the return value of normal functions much simpler: you simply return the desired value, instead of (QRFS_NORMAL, value); for the special results, you simply return _FS_UNAVAIL, instead of (QRFS_UNAVAIL, None). This I believe does simplify the code. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Apollon Oikonomopoulos authored
This patch introduces network configuration for KVM in Ganeti. There are three problems with having KVM perform network configuration via ifup scripts: a) Ganeti never gets to know the tap interface that is associated with an instance's NIC b) Migration of routed instances will cause network problems because the incoming KVM side configures the network as soon as it is spawned and not as soon as the migration finishes. This means that all routing configuration will be present in both, primary and secondary, nodes at the same time, possibly causing network disruption during the migration. c) We never get to know if the network configuration succeeded or not. This patch moves network configuration from KVM to Ganeti, using KVM's ability to receive already open tap devices as file descriptors. _WriteNetScript is removed from hv_kvm.py, together with its unit tests. Minor modifications are made to _ExecKVMRuntime to handle tap device initialization. NIC <-> tap associations are stored under a new directory, _ROOT_DIR/nic in a file-per-nic fashion. The end-user semantics remain the same: The user can override the network configuration by providing _KVM_NET_SCRIPT. If this is not present or executable, the default constants.KVM_IFUP script is run. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
As the class names should be now consistent with the OP_IDs, we add a check for wrongly-defined OP_IDs. However, the future removal of the hand-coded OP_IDs will render this obsolete, so this check is introduced just to make sure that the previous renaming patches did the right job, and it will then be removed. The consistency checks require renaming the test opcodes, which were using arbitrary names, depending on test author. They are now all standardized on OpTest (local scope). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 14, 2011
-
-
Michael Hanselmann authored
Update the version in all necessary places. Update NEWS with release date. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 13, 2011
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 12, 2011
-
-
Michael Hanselmann authored
This patch fixes a number of typos and standardizes RAPI resource docstrings. A unittest is added. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 11, 2011
-
-
Apollon Oikonomopoulos authored
Passing tap devices to KVM as file descriptors requires that the respective file decriptors remain open during utils.RunCmd execution. To this direction, we add a “noclose_fds” keyword argument to utils.RunCmd, accepting a list of file descriptors to keep open. The actual fd handling is implemented in _RunCmdPipe and _RunCmdFile using subprocess.Popen's “preexec_fn”[1], since subprocess.Popen provides no other way to selectively handle fds. A small modification is also made to test/ganeti.utils_unittest.py to comply with _RunCmdPipe's new API and a new unit test is added to test the selective fd retention functionality. [1] “If preexec_fn is set to a callable object, this object will be called in the child process just before the child is executed. (Unix only)” Subprocess documentation Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-