1. 29 May, 2012 1 commit
  2. 06 Jan, 2012 1 commit
  3. 17 Nov, 2011 1 commit
    • Iustin Pop's avatar
      Adapt watcher for ENABLE_CONFD · aa224134
      Iustin Pop authored
      If confd is disabled, do not automatically restart it. Furthermore, we
      can't run maintenance actions if it is disabled so log a warning.
      Note that I haven't completely disabled the NodeMaintenance class with
      ENABLE_CONFD = False because I think they are at two different levels
      (e.g. we might have other maintenance actions done even with confd
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
  4. 12 Oct, 2011 1 commit
    • Iustin Pop's avatar
      Rename filter and filter_ to qfilter · 2e5c33db
      Iustin Pop authored
      We currently use 'filter' as the OpCode, QueryRequest and RAPI field
      name for representing a query filter. However, since 'filter' is a
      built-in function, we actually have to use filter_ throughout the code
      in order to not override the built-in function.
      This patch simply goes and does a global sed over the code. Due to the
      fact that the RAPI interface already exposed this field, we add
      compatibility code for now which handles both forms.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
  5. 30 Aug, 2011 2 commits
  6. 22 Aug, 2011 1 commit
  7. 12 Aug, 2011 1 commit
  8. 05 Aug, 2011 2 commits
  9. 04 Aug, 2011 1 commit
    • Michael Hanselmann's avatar
      ganeti-watcher: Split for node groups · 16e0b9c9
      Michael Hanselmann authored
      This patch brings a huge change to ganeti-watcher to make it aware of
      node groups. Each node group is processed in its own subprocess,
      reducing the impact of long-running operations.
      The global watcher state file, $datadir/ganeti/watcher.data, is replaced
      with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).
      Previously a lock on the state file was used to ensure only one instance
      of watcher was running at the same time. Some operations, e.g.
      “gnt-cluster renew-crypto”, blocked the watcher by acquiring an
      exclusive lock on the state file. Since the watcher processes now use
      different files, this method is no longer usable. Locking multiple files
      isn't atomic. Instead a dedicated lock file is used and every watcher
      process acquires a shared lock on it. If a Ganeti command wants to block
      the watcher it acquires the lock in exclusive mode.
      Each per-nodegroup watcher process also acquires an exclusive lock on
      its state file. This prevents multiple watchers from running for the
      same nodegroup.
      The code is reorganized heavily to clear up dependencies between
      functions and to get rid of the global “client” variable. The utility
      class “Watcher” is removed in favour of stand-alone utility functions.
      Since the parent watcher process won't wait for its children by
      default, a new option (--wait-children) was added. It is used, for
      example, by QA.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  10. 29 Jul, 2011 7 commits
  11. 28 Jul, 2011 1 commit
  12. 26 Jul, 2011 1 commit
  13. 19 Apr, 2011 1 commit
    • Michael Hanselmann's avatar
      Fix bug in watcher · a0aa6b49
      Michael Hanselmann authored
      If “utils.RunParts” were to raise an exception, a log message was
      written and the code continued to run. Due to the exception the
      “results” variable would not be defined.
      Also change the code to log a backtrace (getting an exception is rather
      unlikely and having a backtrace is useful) and update one comment.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
  14. 24 Mar, 2011 1 commit
  15. 17 Mar, 2011 1 commit
  16. 23 Feb, 2011 1 commit
  17. 02 Feb, 2011 1 commit
  18. 27 Jan, 2011 1 commit
  19. 18 Jan, 2011 5 commits
  20. 29 Oct, 2010 1 commit
  21. 14 Oct, 2010 1 commit
    • Iustin Pop's avatar
      Add a new watcher option --ignore-pause · 46c8a6ab
      Iustin Pop authored
      During cluster maintenance, when the watcher is disabled, it's useful to
      run it just once. This is incovenient to do currently, as the watcher
      needs to be unpaused, then run, then paused again.
      This patch adds an option “--ignore-pause” that can be used to ignore
      the cluster-level setting. Also the man page is updated as it was
      missing the options available.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
  22. 02 Sep, 2010 1 commit
  23. 18 Aug, 2010 1 commit
  24. 26 Jul, 2010 1 commit
    • Iustin Pop's avatar
      watcher: smarter handling of instance records · f5116c87
      Iustin Pop authored
      This patch implements a few changes to the instance handling. First, old
      instances which no longer exist on the cluster are removed from the
      state file, to keep things clean.
      Second, the instance restart counters are reset every 8 hours, since
      some error cases might be transient (e.g. networking issues, or machine
      temporarily down), and if the problem takes more than 5 restarts but is
      not permanent, watcher will not restart the instance. The value of 8
      hours is, I think, both conservative (as not to hammer the cluster too
      often with restarts) and fast enough to clear semi-transient problems.
      And last, if an instance is not restarted due to exhausted retries, this
      should be warned, otherwise it's hard to understand why watcher doesn't
      want to restart an ERROR_down instance.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
  25. 09 Jul, 2010 1 commit
  26. 01 Jul, 2010 1 commit
    • Michael Hanselmann's avatar
      RAPI client: Switch to pycURL · 2a7c3583
      Michael Hanselmann authored
      Currently the RAPI client uses the urllib2 and httplib modules from
      Python's standard library. They're used with pyOpenSSL in a very fragile
      way, and there are known issues when receiving large responses from a RAPI
      By switching to PycURL we leverage the power and stability of the
      widely-used curl library (libcurl). This brings us much more flexibility
      than before, and timeouts were easily implemented (something that would
      have involved a lot of work with the built-in modules).
      There's one small drawback: Programs using libcurl have to call
      curl_global_init(3) (available as pycurl.global_init) while exactly one
      thread is running (e.g. before other threads) and are supposed to call
      curl_global_cleanup(3) (available as pycurl.global_cleanup) upon exiting.
      See the manpages for details. A decorator is provided to simplify this.
      Unittests for the new code are provided, increasing the test coverage of
      the RAPI client from 74% to 89%.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  27. 30 Jun, 2010 1 commit
  28. 03 Jun, 2010 1 commit