Commit 16e0b9c9 authored by Michael Hanselmann's avatar Michael Hanselmann
Browse files

ganeti-watcher: Split for node groups



This patch brings a huge change to ganeti-watcher to make it aware of
node groups. Each node group is processed in its own subprocess,
reducing the impact of long-running operations.

The global watcher state file, $datadir/ganeti/watcher.data, is replaced
with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).

Previously a lock on the state file was used to ensure only one instance
of watcher was running at the same time. Some operations, e.g.
“gnt-cluster renew-crypto”, blocked the watcher by acquiring an
exclusive lock on the state file. Since the watcher processes now use
different files, this method is no longer usable. Locking multiple files
isn't atomic. Instead a dedicated lock file is used and every watcher
process acquires a shared lock on it. If a Ganeti command wants to block
the watcher it acquires the lock in exclusive mode.

Each per-nodegroup watcher process also acquires an exclusive lock on
its state file. This prevents multiple watchers from running for the
same nodegroup.

The code is reorganized heavily to clear up dependencies between
functions and to get rid of the global “client” variable. The utility
class “Watcher” is removed in favour of stand-alone utility functions.

Since the parent watcher process won't wait for its children by
default, a new option (--wait-children) was added. It is used, for
example, by QA.
Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
Reviewed-by: default avatarIustin Pop <iustin@google.com>
parent de9c12f7
......@@ -2261,7 +2261,7 @@ class _RunWhileClusterStoppedHelper:
"""
# Pause watcher by acquiring an exclusive lock on watcher state file
self.feedback_fn("Blocking watcher")
watcher_block = utils.FileLock.Open(constants.WATCHER_STATEFILE)
watcher_block = utils.FileLock.Open(constants.WATCHER_LOCK_FILE)
try:
# TODO: Currently, this just blocks. There's no timeout.
# TODO: Should it be a shared lock?
......
This diff is collapsed.
......@@ -76,7 +76,7 @@ def _RunWatcherDaemon():
"""Runs the ganeti-watcher daemon on the master node.
"""
AssertCommand(["ganeti-watcher", "-d", "--ignore-pause"])
AssertCommand(["ganeti-watcher", "-d", "--ignore-pause", "--wait-children"])
def TestPauseWatcher():
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment