Skip to content
Snippets Groups Projects
  1. Aug 04, 2011
    • Michael Hanselmann's avatar
      173dbf05
    • Michael Hanselmann's avatar
      ganeti-watcher: Split for node groups · 16e0b9c9
      Michael Hanselmann authored
      
      This patch brings a huge change to ganeti-watcher to make it aware of
      node groups. Each node group is processed in its own subprocess,
      reducing the impact of long-running operations.
      
      The global watcher state file, $datadir/ganeti/watcher.data, is replaced
      with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).
      
      Previously a lock on the state file was used to ensure only one instance
      of watcher was running at the same time. Some operations, e.g.
      “gnt-cluster renew-crypto”, blocked the watcher by acquiring an
      exclusive lock on the state file. Since the watcher processes now use
      different files, this method is no longer usable. Locking multiple files
      isn't atomic. Instead a dedicated lock file is used and every watcher
      process acquires a shared lock on it. If a Ganeti command wants to block
      the watcher it acquires the lock in exclusive mode.
      
      Each per-nodegroup watcher process also acquires an exclusive lock on
      its state file. This prevents multiple watchers from running for the
      same nodegroup.
      
      The code is reorganized heavily to clear up dependencies between
      functions and to get rid of the global “client” variable. The utility
      class “Watcher” is removed in favour of stand-alone utility functions.
      
      Since the parent watcher process won't wait for its children by
      default, a new option (--wait-children) was added. It is used, for
      example, by QA.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      16e0b9c9
  2. Jul 29, 2011
  3. May 16, 2011
  4. Nov 29, 2010
    • Iustin Pop's avatar
      Simplify QA commands · 2f4b4f78
      Iustin Pop authored
      
      Currently, 95% of the QA commands are executed in the same way: on the
      master, based on a command list and with expectancies for succes:
      
          AssertEqual(StartSSH(master['primary'],
                               utils.ShellQuoteArgs(cmd)).wait(), 0)
      
      The rest 5% are variations on this theme (maybe the command needs to
      fail, or the node is different, etc.). Based on this, we can simplify
      the code significantly if we abstract the common theme into a new
      AssertCommand() function. This saves ~250 lines of code in the QA suite,
      around 8% of the entire QA code size.
      
      Additionally, the output was very cryptic before (the famous "QA error:
      1 != 0" messages), whereas now we show a clear error message (node,
      command, exit code and failure mode).
      
      The patch replaces single quotes with double quotes in all the parts of
      the code that I touch; let me know if that's not OK…
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      2f4b4f78
  5. Nov 01, 2010
  6. Oct 14, 2010
    • Iustin Pop's avatar
      Rework QA interaction with the watcher · 8201b996
      Iustin Pop authored
      
      The interaction with cron-launched watcher is a well-known failure mode of QA:
      
      ---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance
      
      For the following tests it's recommended to turn off the ganeti-watcher cronjob.
      
      ---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher
      …
      Error: Domain 'instance1' does not exist.
      Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
        -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
      2010-10-13 23:55:04,479:  pid=1659 ganeti-watcher:626
       ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
      ---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher
      
      In order to fix this, we disable the watcher during these tests, and
      re-enable it afterwards. To protect against watcher being disabled, we
      enable it unconditionally at the start of the QA (we do want it enabled,
      in order to see the interaction between the watcher and many
      creation/disk replace jobs, etc.).
      
      Note: even after this patch, if a cron-watcher was started and is still
      running during the test, we'll have locking issues. I think for now this
      is OK, we'll have to see how often that happens.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      8201b996
  7. Aug 15, 2008
  8. Feb 14, 2008
  9. Dec 03, 2007
  10. Nov 22, 2007
  11. Nov 13, 2007
  12. Nov 01, 2007
  13. Oct 10, 2007
  14. Sep 26, 2007
    • Michael Hanselmann's avatar
      Enhance QA. · 5d640672
      Michael Hanselmann authored
      - Test “gnt-backup export” and “gnt-backup import”.
      - Move “ResolveInstanceName” to qa_utils.py.
      - Fix tests for “ganeti-watcher”.
      - Make instance shutdown and startup configurable.
      
      Reviewed-by: schreiberal
      
      5d640672
  15. Sep 13, 2007
Loading