- Feb 18, 2011
-
-
Guido Trotter authored
This tests at least the basic case, unfortunately there is no way to check all possibilities using the provided rapi client, as that will use the new method unless the cluster doesn't support it. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 28, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Add “cluster-oob” to sample configuration file. Don't run RAPI group tests if disabled. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 12, 2011
-
-
Iustin Pop authored
Right now, the QA code is not covered by pylint, and this shows at least one low-impact bug. This patch does the necessary changes to make QA pylint-clean, and the changes the makefile to run pylint for it. Notable changes: - qa_utils.GenericQueryTest: randfields was not used at all, and my belief is that it was indented to be used in order not to modify the input list; so I replaced randfields with fields, so we only shuffle the our local copy - qa_node.TestOutOfBand was using it's own copy of AcquireNode(), so I replaced it with the existing version - qa_os: was using 'dir' in a couple of places, replaced with dirname Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The recent additions to QA (many more tests) make QA slow if the machine on which the QA runs is not very close to the tested nodes — or in general, when the SSH handhaske is costly. We discussed before about using a persistent connection, and here is the patch that implements it. On a very small QA (very very small), it cuts down a lot of time (almost half), so it should be useful even for a full QA. I've also thought about changing from external ssh to paramiko, but I estimated that it would be more work to correctly interleave the IO from the remote process than just running a background SSH. Also note that yes, the global dict is ugly, but I don't know of another simple way to implement this. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Patch f55312bd added the OOB tests to TestClusterVerify, which is not actually a test for cluster verify, but a runner for cluster verify that is called multiple times, for each instance type, etc. This led to running the OOB commands multiple times, which is painful especially as this is a slow test. The patch moves this to a separate test, that is run only once. Furthermore, the way that data files are copied around is very inefficient: touch + mv + chmod + mv + rm for each node (5 times number of nodes), whereas it could be simply: touch on master, chmod on master, cluster copyfile, chmod on master, cluster copyfile, cluster command rm, i.e. only 5 fixed ssh calls to the master. The code is changed as such, for increased speed. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 10, 2011
-
-
Adeodato Simo authored
Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 06, 2011
-
-
Adeodato Simo authored
Now that group queries use query2 infrastructure, update the QA tests to use the generic functions in qa_utils.py. Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 20, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 17, 2010
-
-
Michael Hanselmann authored
“gnt-cluster verify” looks at some per-instance information as well, so it should be run for each instance type QA tests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 16, 2010
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 14, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Dec 13, 2010
-
-
Adeodato Simo authored
This adds QA tests for the SetGroupParams operation, both for CLI and RAPI. Additionally, it adds tests for add/rename/remove groups via RAPI, which had not been included in a previous patch series. Finally, it also tests setting "alloc_policy" (and, for the CLI, "ndparams") at group creation time. Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 10, 2010
-
-
Michael Hanselmann authored
- Query all known fields - Random combinations (using a PRNG with a fixed seed) of fields - Order of result names Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 09, 2010
-
-
Guido Trotter authored
Use the simplified command and rapi version to perform an instance rename to the same name. This is performed anytime the rename test is enabled, while the "other-name" rename is performed when also an alternative name is provided. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
The current instance rename qa testing function can only perform back-and-forth renames, both for command line and rapi. In order to be able to perform same-name rename tests we change it to be able to perform simple renames, and then we change qa to call it to perform both sides of the renaming. The same change is applied both to the local and the rapi test. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 08, 2010
-
-
Adeodato Simo authored
This is a single function that tests all of of the following: - creating groups - creating groups that exist fails - renaming an empty group - renaming a group with nodes - renaming to a name that already exists fails - removing an empty group works - removing a group with nodes fails The "default" group is only used for the "rename group with nodes" test. Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 01, 2010
-
-
Adeodato Simo authored
This adds QA tests for both CLI and RAPI. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 30, 2010
-
-
Iustin Pop authored
This is more of an RFC. The patch attempts to address two issues: - running conditional tests is ugly right now - we don't know what tests we skipped By using the new RunTestIf, we solve both. But a significant number of test decisions are more complex than just “is test enabled”, so those remain to be run via RunTest, which means we don't get logging of when they're not run. Hence the logging is not complete… Sugesstions on how to solve it are welcome. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Nov 17, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 03, 2010
-
-
Michael Hanselmann authored
This tests some parts of the disk information collection. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Luca Bigliardi <shammash@google.com>
-
- Oct 28, 2010
-
-
Michael Hanselmann authored
To remove the instance after an export it needs to be stopped. This can be achived using the parameter “shutdown”, or by explicitly shutting down the instance before exporting. The latter would still require the “shutdown” parameter to be set. To make it more intuitive, this requirement is changed with this patch. Instances already stopped are accepted for automatic removal. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
On my machine it takes over 30 seconds, disabling it can speed up the QA. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 25, 2010
-
-
Iustin Pop authored
There are two node tests that are run from RunCommonInstanceTests, which is the bad place—it causes these node tests to be run three times instead of once. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 20, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 14, 2010
-
-
Iustin Pop authored
I did forgot this in the original patch. Sorry!!!! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The interaction with cron-launched watcher is a well-known failure mode of QA: ---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance For the following tests it's recommended to turn off the ganeti-watcher cronjob. ---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher … Error: Domain 'instance1' does not exist. Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d' 2010-10-13 23:55:04,479: pid=1659 ganeti-watcher:626 ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked ---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher In order to fix this, we disable the watcher during these tests, and re-enable it afterwards. To protect against watcher being disabled, we enable it unconditionally at the start of the QA (we do want it enabled, in order to see the interaction between the watcher and many creation/disk replace jobs, etc.). Note: even after this patch, if a cron-watcher was started and is still running during the test, we'll have locking issues. I think for now this is OK, we'll have to see how often that happens. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 08, 2010
-
-
Iustin Pop authored
Currently, the logging in QA doesn't show the duration of the various steps, and if it is needed one has to perform log manipulation. This patch changes the output so that the log informatio is line based (as opposed to block-based), such that it's easy to grep for all log lines: ./qa/ganeti-qa.py --yes-do-it qa.json 2>&1|grep ^---- ---- 2010-10-08 14:40:21.730382 start Test SSH connection -------------- ---- 2010-10-08 14:40:23.156633 time=0:00:01.426251 Test SSH connection ---- 2010-10-08 14:40:23.156735 start ICMP ping each node -------------- ---- 2010-10-08 14:40:24.230479 time=0:00:01.073744 ICMP ping each node ---- 2010-10-08 14:40:24.230583 start Test availibility of Ganeti commands ---- 2010-10-08 14:40:32.314586 time=0:00:08.084003 Test availibility of Ganeti commands ---- 2010-10-08 14:40:32.314734 start gnt-node info -------------------- ---- 2010-10-08 14:40:32.860884 time=0:00:00.546150 gnt-node info ------ or just for the duration of the steps: ./qa/ganeti-qa.py --yes-do-it ../qa-mpgntac5.fra.json 2>&1|grep ^----.*time= ---- 2010-10-08 14:42:12.630067 time=0:00:01.239256 Test SSH connection ---- 2010-10-08 14:42:14.204393 time=0:00:01.574221 ICMP ping each node ---- 2010-10-08 14:42:22.170828 time=0:00:07.966331 Test availibility of Ganeti commands ---- 2010-10-08 14:42:22.701030 time=0:00:00.530037 gnt-node info ------ This will help with identifying slow steps or even graphing the QA duration. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 07, 2010
-
-
Iustin Pop authored
This time, we re-establish the old pri/sec nodes corretly. Unfortunately this will require now a 3-node cluster at least for drbd instances, hence it's somewhat suboptimal, but… The other option would be to move it simply from p:s to s:p and then back to p:s, without involving a third node (for DRBD case), but I think that moving it to a completely separate node is slightly better for testing. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 06, 2010
-
-
Iustin Pop authored
The instance move tests were moving the instance from node pair (A,_) to (B, A), and left it there. This patch makes sure that the first step moves the instance to (B,A) but the second one back to (A,B), so that the instance is left on the same primary node. The original secondary node is lost though, if I read the code correctly. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 30, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 19, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 18, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 10, 2010
-
-
Michael Hanselmann authored
“gnt-backup export” requires the target node. Until now, the master daemon would complain that the “parameter 'OP_BACKUP_EXPORT.target_node' fails validation”. With this patch, an additional check is done in the client program. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Manuel Franceschini <livewire@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 29, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 26, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jul 01, 2010
-
-
Michael Hanselmann authored
Currently the RAPI client uses the urllib2 and httplib modules from Python's standard library. They're used with pyOpenSSL in a very fragile way, and there are known issues when receiving large responses from a RAPI server. By switching to PycURL we leverage the power and stability of the widely-used curl library (libcurl). This brings us much more flexibility than before, and timeouts were easily implemented (something that would have involved a lot of work with the built-in modules). There's one small drawback: Programs using libcurl have to call curl_global_init(3) (available as pycurl.global_init) while exactly one thread is running (e.g. before other threads) and are supposed to call curl_global_cleanup(3) (available as pycurl.global_cleanup) upon exiting. See the manpages for details. A decorator is provided to simplify this. Unittests for the new code are provided, increasing the test coverage of the RAPI client from 74% to 89%. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Because we have to. :) Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-