Skip to content
  • Iustin Pop's avatar
    watcher: fix startup sequence locking the master · cc962d58
    Iustin Pop authored
    Currently, the watcher startup sequence does:
      - open a luxi client
      - get the instance list
      - get the node boot ids
      - open and lock the status file, and:
        - archive jobs
        - restart the down instances
        - check disks
    
    This, of course, can lead to problems when a node is (genuinely or not)
    locked for more than (watcher interval * maximum query clients) time. At
    that time, the master is completely unresponsive until the node is
    unlocked and all the watchers exit with error due to the state file
    being locked by the first instance.
    
    This patch reworks the startup sequence to first open/lock the status
    file, and only then open a luxi client. This should prevent the above
    case.
    
    Reviewed-by: ultrotter
    cc962d58