- Apr 06, 2011
-
-
Michael Hanselmann authored
Until now LUInstanceQueryData always acquired locks for the instance(s) and nodes involved. In combination with long-running operations this prevented the use of “gnt-instance info”, even with the “--static” option. With this patch, locks are only acquired when explicitely requested in the opcode (like all query operations). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This has been observed to cause problems on real clusters via the following mechanism: - a long job (e.g. a replace-disks) is keeping an exclusive lock on an instance - the watcher starts and submits its query instances opcode which wants shared locks for all instances - after about an hour, the watcher job falls back to blocking acquire, after having acquired all other locks - any instance opcode that wants an exclusive lock for an instance cannot start until the watcher has finished, even though there's no actual operation on that instance In order to alleviate this problem, we simply increase the max timeout until lock acquires are sent back to either blocking acquire or priority increase. The timeout is computed such that we wait ~10 hours (instead of one) for this to happen, which should be within the maximum lifetime of a reasonable opcode on a healthy cluster. The timeout also means that priority increases will happen every half hour. We also increase the max wait interval to 15 seconds, otherwise we'd have too many retries with the increased interval. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 04, 2011
-
-
Iustin Pop authored
Before this, the output in the rapi daemon log was: 2011-04-04 03:09:51,026: ganeti-rapi pid=17447 INFO Reading users file at /var/lib/ganeti/rapi/users 2011-04-04 03:09:51,027: ganeti-rapi pid=17447 INFO ganeti-rapi daemon startup Which is confusing, as it might look like the read of the users file is part of the previous run. This is because we log the 'daemon startup' message after the prepare_fn, which can log things on its own. The patch simply moves the 'daemon startup' message just before prepare_fn call. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This changes the display from: Mon Apr 4 02:29:46 2011 * Verifying N+1 Memory redundancy Mon Apr 4 02:29:46 2011 - ERROR: node node2: not enough memory to accomodate instance failovers should node node1 fail To: Mon Apr 4 02:32:50 2011 * Verifying N+1 Memory redundancy Mon Apr 4 02:32:50 2011 - ERROR: node node2: not enough memory to accomodate instance failovers should node node1 fail (33536MiB needed, 27910MiB available) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Mar 31, 2011
-
-
Iustin Pop authored
This is not needed for this function, and can interfere with debugging of ssh failures. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Mar 24, 2011
-
-
Michael Hanselmann authored
This was added to the NEWS file in commit ab221ddf, but never documented properly. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If the result of an opcode was a non-empty dictionary, it would be impossible to differenciate between input and result: Input fields: […] debug_level: 0 fields: cluster_name,master_node,volume_group_name jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']] Expected output: Input fields: […] debug_level: 0 fields: cluster_name,master_node,volume_group_name Result: jobs: [[True, u'37922'], [True, u'37923'], [True, u'37924']] Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 17, 2011
-
-
Michael Hanselmann authored
When “ganeti-watcher” is called with an argument, it would hint at a non-existing “-f” parameter. With this patch the separate usage string is no longer necessary. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 16, 2011
-
-
Michael Hanselmann authored
In some rare cases it can happen that a lock is re-created very soon after deletion, while the old instance hasn't been destructed yet. In such a case the code would detect a duplicate name and raise an exception. We have seen at least one case where this happened during the creation of many instances. It is not exactly clear how it came to be, but it appears to have occurred while different jobs fought for locks with short timeouts (in the case of instance creation locks are added at this stage and removed shortly after if not all locks can be acquired). The issue is fixed by removing the check for duplicate names. To still guarantee a stable sort order for the lock information as shown by “gnt-debug locks”, a registration number is recorded for each lock in the monitor. A unittest is included to check for the situation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 15, 2011
-
-
Michael Hanselmann authored
The ability to split a string into a list of strings and integers can be handy elsewhere and is necessary for sorting query results by names. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit f47941f8)
-
- Mar 11, 2011
-
-
Guido Trotter authored
This reverts commit 288f240f. That commit was buggy at various levels: - broke ssh access to the second cluster, making cluster-merge unusable (unless ssh key were previously setup?) - filtered away offline nodes from being added to the cluster config (wrong, they should be kept, as offline) - broke commit-check The previous commit makes the code work again with what this commit tried to achieve. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
The node list in MergerData is used only to: - stop ganeti on the nodes - readd the nodes to the cluster As such offline nodes should be skipped from it. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 10, 2011
-
-
Stephen Shirley authored
Otherwise the readd will fail, breaking the merge. Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 07, 2011
-
-
Iustin Pop authored
NEWS update and version bump. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
* devel-2.3: Fix LUClusterRepairDiskSizes and rpc result usage Fix RPC mismatch in blockdev_getsize[s] RAPI: fix evacuate node resource Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Also specifies the comma-escaping feature. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
* devel-2.2: Fix LUClusterRepairDiskSizes and rpc result usage Fix RPC mismatch in blockdev_getsize[s] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 04, 2011
-
-
Iustin Pop authored
This LU was introduced before the RPC result conversion from .data to .payload, and it has managed to keep the old-style usage (how? it's the only LU that does so). Fix by changing to payload, and add some extra logging for easier diagnose. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 043beb38)
-
Iustin Pop authored
Commit 92fd2250 added consistency checks in the RPC layer, which broke the call_blockdev_getsizes RPC call (declared with 's' at the end in rpc.py, without 's' in the node daemon). The immediate fix is to correct the rpc function name, the long term one will be to remove this duplication. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Stephen Shirley <diamond@google.com> (cherry picked from commit ccfbbd2d)
-
Iustin Pop authored
PollJob returns the whole op_results, hence a list of opcode results. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Mar 02, 2011
-
-
Guido Trotter authored
* origin/stable-2.4: Fix typo in kvm-ifup script NEWS: Replace smartquotes, start lines with uppercase Update NEWS and release 2.4.0 rc3 Fix potential data-loss bug in disk wipe routines Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Reported-by:
Bas Tichelaar <bas@30loops.net> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Mar 01, 2011
-
-
Michael Hanselmann authored
- Sphinx converts ASCII quotes ("") to smartquotes (“”) automatically - Sentences or list items start with an uppercase letter - Changed description of non-verbose “gnt-* list” output slightly Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 28, 2011
-
-
Michael Hanselmann authored
The exception was never actually raised. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Adeodato Simo <dato@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
* devel-2.4: 1-char comment typo fix Expand some acronyms, add to glossary query_unittest: Fix argument to set() Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
For the 2.4 release, we only add the missing RPC calls. However, this needs to be fixed properly, by preventing usage of mis-configured disks. Also add a bit more logging so that it's directly clear on which node the wipe is being done. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Feb 25, 2011
-
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 24, 2011
-
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 23, 2011
-
-
René Nussbaumer authored
Commit e431074f introduced an uncatched bug. This patch fixes this. The set is expecting a list or iteratable to work on, so it splitted the provided instance name into a set of characters. This caused the exp_status never been set and therefore not catched in one assert rule further below who checks that every status was tested. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 22, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 21, 2011
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
* devel-2.4: (23 commits) Fix pylint warnings Change the list formatting to a 'special' chars Add support for merging node groups Add option to rename groups on conflict Fix minor docstring typo Fix HV/OS parameter validation on non-vm nodes NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes NodeQuery: don't query non-vm_capable nodes Remove superfluous redundant requirement Don't remove master_candidate flag from merged nodes Use a consistent ECID base listrunner: convert from getopt to optparse listrunner: fix agent usage Revert "Disable the cluster-merge tool for the moment" Fix cluster-merging by not stopping noded Fix error msg for instances on offline nodes Minor reordering to match param order cluster verify and instance disks on offline nodes Cluster verify and N+1 warnings for offline nodes Handle gnt-instance shutdown --all for empty clusters Use gnt-node add --force-join to add foreign nodes Add --force-join option to gnt-node add Fix iterating over node groups Of the above commits present in the devel-2.4 branch, only the “Add --force-join option to gnt-node add” is a potential issue, but this has been QA-ed successfully. The other fixes are split in three groups: - non-core changes (cluster-merge, listrunner) - trivial fixes (docstrings, etc.) - bugs that we want fixed As such, instead of cherry-picking only individual patches, I propose that we unify stable and devel 2.4 and make a new RC out of the result. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Feb 18, 2011
-
-
Stephen Shirley authored
- 1 80-char line infraction - 4 changes in how arguments are passed to logging functions - 3 pylint disable-msg's because cluster-merge needs to access ganeti config internals Signed-off-by:
Stephen Shirley <diamond@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Currently the QA rename job wrongly passed the whole info dict to the client. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
And also enable verbose display via the, well, verbose option. Man page and tests are updated, and the formatting is moved from 4 if statements to a data structure. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This tests at least the basic case, unfortunately there is no way to check all possibilities using the provided rapi client, as that will use the new method unless the cluster doesn't support it. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-