- Aug 04, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This patch brings a huge change to ganeti-watcher to make it aware of node groups. Each node group is processed in its own subprocess, reducing the impact of long-running operations. The global watcher state file, $datadir/ganeti/watcher.data, is replaced with a state file per node group ($datadir/ganeti/watcher.${uuid}.data). Previously a lock on the state file was used to ensure only one instance of watcher was running at the same time. Some operations, e.g. “gnt-cluster renew-crypto”, blocked the watcher by acquiring an exclusive lock on the state file. Since the watcher processes now use different files, this method is no longer usable. Locking multiple files isn't atomic. Instead a dedicated lock file is used and every watcher process acquires a shared lock on it. If a Ganeti command wants to block the watcher it acquires the lock in exclusive mode. Each per-nodegroup watcher process also acquires an exclusive lock on its state file. This prevents multiple watchers from running for the same nodegroup. The code is reorganized heavily to clear up dependencies between functions and to get rid of the global “client” variable. The utility class “Watcher” is removed in favour of stand-alone utility functions. Since the parent watcher process won't wait for its children by default, a new option (--wait-children) was added. It is used, for example, by QA. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 29, 2011
-
-
Michael Hanselmann authored
WATCHER_STATEFILE will be removed at the end of this patch series. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 16, 2011
-
-
Iustin Pop authored
Instead of hardcoded Xen commands. This will make it work for all hypervisors, instead of duplicating hypervisor functionality in QA itself. The timeout has been removed as gnt-instance stop itself will make sure the instance is down before returning. We just double-check that it is indeed down. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Nov 29, 2010
-
-
Iustin Pop authored
Currently, 95% of the QA commands are executed in the same way: on the master, based on a command list and with expectancies for succes: AssertEqual(StartSSH(master['primary'], utils.ShellQuoteArgs(cmd)).wait(), 0) The rest 5% are variations on this theme (maybe the command needs to fail, or the node is different, etc.). Based on this, we can simplify the code significantly if we abstract the common theme into a new AssertCommand() function. This saves ~250 lines of code in the QA suite, around 8% of the entire QA code size. Additionally, the output was very cryptic before (the famous "QA error: 1 != 0" messages), whereas now we show a clear error message (node, command, exit code and failure mode). The patch replaces single quotes with double quotes in all the parts of the code that I touch; let me know if that's not OK… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 01, 2010
-
-
Michael Hanselmann authored
… instead of an object. Allows it to be used in places where only the name is available. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Oct 14, 2010
-
-
Iustin Pop authored
The interaction with cron-launched watcher is a well-known failure mode of QA: ---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance For the following tests it's recommended to turn off the ganeti-watcher cronjob. ---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher … Error: Domain 'instance1' does not exist. Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d' 2010-10-13 23:55:04,479: pid=1659 ganeti-watcher:626 ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked ---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher In order to fix this, we disable the watcher during these tests, and re-enable it afterwards. To protect against watcher being disabled, we enable it unconditionally at the start of the QA (we do want it enabled, in order to see the interaction between the watcher and many creation/disk replace jobs, etc.). Note: even after this patch, if a cron-watcher was started and is still running during the test, we'll have locking issues. I think for now this is OK, we'll have to see how often that happens. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 15, 2008
-
-
Michael Hanselmann authored
To my knowledge they're used nowhere and it's at least slightly confusing to people adding new QA checks. Reviewed-by: ultrotter
-
- Feb 14, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
- Dec 03, 2007
-
-
Michael Hanselmann authored
- When line wrapping is needed, move spaces to the next line. - Remove embedded line breaks from error messages. Reviewed-by: schreiberal
-
- Nov 22, 2007
-
-
Michael Hanselmann authored
Reviewed-by: schreiberal
-
- Nov 13, 2007
-
-
Michael Hanselmann authored
This makes the tests much more reliably because it avoids race conditions. It also helps to speed them up a lot. Reviewed-by: iustinp
-
- Nov 01, 2007
-
-
Michael Hanselmann authored
Make the code somewhat smaller. Disable disk failure test for master for now. Reviewed-by: schreiberal
-
- Oct 10, 2007
-
-
Michael Hanselmann authored
- Implement colours in qa_utils. - Print warning for cron script. Reviewed-by: iustinp
-
- Sep 26, 2007
-
-
Michael Hanselmann authored
- Test “gnt-backup export” and “gnt-backup import”. - Move “ResolveInstanceName” to qa_utils.py. - Fix tests for “ganeti-watcher”. - Make instance shutdown and startup configurable. Reviewed-by: schreiberal
-
- Sep 13, 2007
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-