• Iustin Pop's avatar
    Rework QA interaction with the watcher · 8201b996
    Iustin Pop authored
    The interaction with cron-launched watcher is a well-known failure mode of QA:
    ---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance
    For the following tests it's recommended to turn off the ganeti-watcher cronjob.
    ---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher
    Error: Domain 'instance1' does not exist.
    Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
      -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
    2010-10-13 23:55:04,479:  pid=1659 ganeti-watcher:626
     ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
    ---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher
    In order to fix this, we disable the watcher during these tests, and
    re-enable it afterwards. To protect against watcher being disabled, we
    enable it unconditionally at the start of the QA (we do want it enabled,
    in order to see the interaction between the watcher and many
    creation/disk replace jobs, etc.).
    Note: even after this patch, if a cron-watcher was started and is still
    running during the test, we'll have locking issues. I think for now this
    is OK, we'll have to see how often that happens.
    Signed-off-by: default avatarIustin Pop <iustin@google.com>
    Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>