1. 16 May, 2011 1 commit
  2. 18 Feb, 2011 1 commit
  3. 28 Jan, 2011 2 commits
  4. 12 Jan, 2011 3 commits
    • Iustin Pop's avatar
      Run pylint over QA code too · 3582eef6
      Iustin Pop authored
      
      
      Right now, the QA code is not covered by pylint, and this shows at
      least one low-impact bug.
      
      This patch does the necessary changes to make QA pylint-clean, and the
      changes the makefile to run pylint for it.
      
      Notable changes:
      
      - qa_utils.GenericQueryTest: randfields was not used at all, and my
        belief is that it was indented to be used in order not to modify the
        input list; so I replaced randfields with fields, so we only shuffle
        the our local copy
      - qa_node.TestOutOfBand was using it's own copy of AcquireNode(), so I
        replaced it with the existing version
      - qa_os: was using 'dir' in a couple of places, replaced with dirname
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      3582eef6
    • Iustin Pop's avatar
      QA: use a persistent SSH connection to the master · f7e6f3c8
      Iustin Pop authored
      
      
      The recent additions to QA (many more tests) make QA slow if the
      machine on which the QA runs is not very close to the tested nodes —
      or in general, when the SSH handhaske is costly.
      
      We discussed before about using a persistent connection, and here is
      the patch that implements it. On a very small QA (very very small), it
      cuts down a lot of time (almost half), so it should be useful even for
      a full QA.
      
      I've also thought about changing from external ssh to paramiko, but I
      estimated that it would be more work to correctly interleave the IO
      from the remote process than just running a background SSH.
      
      Also note that yes, the global dict is ugly, but I don't know of
      another simple way to implement this.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      f7e6f3c8
    • Iustin Pop's avatar
      QA: Fix duplicated OOB tests · 69df9d2b
      Iustin Pop authored
      Patch f55312bd
      
       added the OOB tests to TestClusterVerify, which is not
      actually a test for cluster verify, but a runner for cluster verify
      that is called multiple times, for each instance type, etc. This led
      to running the OOB commands multiple times, which is painful
      especially as this is a slow test.
      
      The patch moves this to a separate test, that is run only once.
      
      Furthermore, the way that data files are copied around is very
      inefficient: touch + mv + chmod + mv + rm for each node (5 times
      number of nodes), whereas it could be simply: touch on master, chmod
      on master, cluster copyfile, chmod on master, cluster copyfile,
      cluster command rm, i.e. only 5 fixed ssh calls to the master. The
      code is changed as such, for increased speed.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      69df9d2b
  5. 10 Jan, 2011 1 commit
  6. 06 Jan, 2011 1 commit
  7. 20 Dec, 2010 1 commit
  8. 17 Dec, 2010 2 commits
  9. 16 Dec, 2010 1 commit
  10. 14 Dec, 2010 1 commit
  11. 13 Dec, 2010 1 commit
  12. 10 Dec, 2010 1 commit
  13. 09 Dec, 2010 2 commits
  14. 08 Dec, 2010 1 commit
  15. 01 Dec, 2010 1 commit
  16. 30 Nov, 2010 1 commit
    • Iustin Pop's avatar
      Further cleanups on QA · 7d88f255
      Iustin Pop authored
      
      
      This is more of an RFC. The patch attempts to address two issues:
      
      - running conditional tests is ugly right now
      - we don't know what tests we skipped
      
      By using the new RunTestIf, we solve both. But a significant number of
      test decisions are more complex than just “is test enabled”, so those
      remain to be run via RunTest, which means we don't get logging of when
      they're not run. Hence the logging is not complete… Sugesstions on how
      to solve it are welcome.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      7d88f255
  17. 17 Nov, 2010 1 commit
  18. 03 Nov, 2010 1 commit
  19. 28 Oct, 2010 2 commits
  20. 25 Oct, 2010 1 commit
  21. 20 Oct, 2010 1 commit
  22. 14 Oct, 2010 2 commits
    • Iustin Pop's avatar
      Brown-bag fix for leftover comment · 76917d97
      Iustin Pop authored
      
      
      I did forgot this in the original patch. Sorry!!!!
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      76917d97
    • Iustin Pop's avatar
      Rework QA interaction with the watcher · 8201b996
      Iustin Pop authored
      
      
      The interaction with cron-launched watcher is a well-known failure mode of QA:
      
      ---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance
      
      For the following tests it's recommended to turn off the ganeti-watcher cronjob.
      
      ---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher
      …
      Error: Domain 'instance1' does not exist.
      Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
        -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
      2010-10-13 23:55:04,479:  pid=1659 ganeti-watcher:626
       ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
      ---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher
      
      In order to fix this, we disable the watcher during these tests, and
      re-enable it afterwards. To protect against watcher being disabled, we
      enable it unconditionally at the start of the QA (we do want it enabled,
      in order to see the interaction between the watcher and many
      creation/disk replace jobs, etc.).
      
      Note: even after this patch, if a cron-watcher was started and is still
      running during the test, we'll have locking issues. I think for now this
      is OK, we'll have to see how often that happens.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      8201b996
  23. 08 Oct, 2010 1 commit
    • Iustin Pop's avatar
      Change QA log output · f89d59b9
      Iustin Pop authored
      
      
      Currently, the logging in QA doesn't show the duration of the various
      steps, and if it is needed one has to perform log manipulation. This
      patch changes the output so that the log informatio is line based (as
      opposed to block-based), such that it's easy to grep for all log lines:
      
      ./qa/ganeti-qa.py --yes-do-it qa.json  2>&1|grep ^----
      ---- 2010-10-08 14:40:21.730382 start Test SSH connection --------------
      ---- 2010-10-08 14:40:23.156633 time=0:00:01.426251 Test SSH connection
      ---- 2010-10-08 14:40:23.156735 start ICMP ping each node --------------
      ---- 2010-10-08 14:40:24.230479 time=0:00:01.073744 ICMP ping each node
      ---- 2010-10-08 14:40:24.230583 start Test availibility of Ganeti commands
      ---- 2010-10-08 14:40:32.314586 time=0:00:08.084003 Test availibility of Ganeti commands
      ---- 2010-10-08 14:40:32.314734 start gnt-node info --------------------
      ---- 2010-10-08 14:40:32.860884 time=0:00:00.546150 gnt-node info ------
      
      or just for the duration of the steps:
      ./qa/ganeti-qa.py --yes-do-it ../qa-mpgntac5.fra.json  2>&1|grep ^----.*time=
      ---- 2010-10-08 14:42:12.630067 time=0:00:01.239256 Test SSH connection
      ---- 2010-10-08 14:42:14.204393 time=0:00:01.574221 ICMP ping each node
      ---- 2010-10-08 14:42:22.170828 time=0:00:07.966331 Test availibility of Ganeti commands
      ---- 2010-10-08 14:42:22.701030 time=0:00:00.530037 gnt-node info ------
      
      This will help with identifying slow steps or even graphing the QA
      duration.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      f89d59b9
  24. 07 Oct, 2010 1 commit
    • Iustin Pop's avatar
      Try again to fix the inter-cluster move QA test · 638a7266
      Iustin Pop authored
      
      
      This time, we re-establish the old pri/sec nodes corretly. Unfortunately this
      will require now a 3-node cluster at least for drbd instances, hence it's
      somewhat suboptimal, but… The other option would be to move it simply from p:s
      to s:p and then back to p:s, without involving a third node (for DRBD case),
      but I think that moving it to a completely separate node is slightly better for
      testing.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      638a7266
  25. 06 Oct, 2010 1 commit
    • Iustin Pop's avatar
      QA: Fix instance move tests · 677e16eb
      Iustin Pop authored
      
      
      The instance move tests were moving the instance from node pair (A,_) to
      (B, A), and left it there. This patch makes sure that the first step
      moves the instance to (B,A) but the second one back to (A,B), so that
      the instance is left on the same primary node.
      
      The original secondary node is lost though, if I read the code
      correctly.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      677e16eb
  26. 30 Sep, 2010 1 commit
  27. 19 Aug, 2010 1 commit
  28. 18 Aug, 2010 1 commit
  29. 10 Aug, 2010 2 commits
  30. 29 Jul, 2010 1 commit
  31. 26 Jul, 2010 1 commit
  32. 01 Jul, 2010 1 commit
    • Michael Hanselmann's avatar
      RAPI client: Switch to pycURL · 2a7c3583
      Michael Hanselmann authored
      
      
      Currently the RAPI client uses the urllib2 and httplib modules from
      Python's standard library. They're used with pyOpenSSL in a very fragile
      way, and there are known issues when receiving large responses from a RAPI
      server.
      
      By switching to PycURL we leverage the power and stability of the
      widely-used curl library (libcurl). This brings us much more flexibility
      than before, and timeouts were easily implemented (something that would
      have involved a lot of work with the built-in modules).
      
      There's one small drawback: Programs using libcurl have to call
      curl_global_init(3) (available as pycurl.global_init) while exactly one
      thread is running (e.g. before other threads) and are supposed to call
      curl_global_cleanup(3) (available as pycurl.global_cleanup) upon exiting.
      See the manpages for details. A decorator is provided to simplify this.
      
      Unittests for the new code are provided, increasing the test coverage of
      the RAPI client from 74% to 89%.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      2a7c3583