watcher: fix startup sequence locking the master (cc962d58) · Commits · itminedu / snf-ganeti

Commit cc962d58 authored 16 years ago by Iustin Pop

watcher: fix startup sequence locking the master

Currently, the watcher startup sequence does:
  - open a luxi client
  - get the instance list
  - get the node boot ids
  - open and lock the status file, and:
    - archive jobs
    - restart the down instances
    - check disks

This, of course, can lead to problems when a node is (genuinely or not)
locked for more than (watcher interval * maximum query clients) time. At
that time, the master is completely unresponsive until the node is
unlocked and all the watchers exit with error due to the state file
being locked by the first instance.

This patch reworks the startup sequence to first open/lock the status
file, and only then open a luxi client. This should prevent the above
case.

Reviewed-by: ultrotter

parent c614e5fb

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 21 additions and 16 deletions

Please register or to comment