Skip to content
  • Iustin Pop's avatar
    node daemon: allow working with broken queue dir · 81198f6e
    Iustin Pop authored
    
    
    In case the queue dir cannot be create/initialized, currently
    ganeti-noded exits. This means that a read-only filesystem or a
    permission error breaks all node daemon functionality, including
    powercycle. This is not good for the usual failure case for nodes.
    
    To workaround this, we don't require successful initialization at node
    daemon startup; if we can't init the queue dir/lock, we retry at every
    RPC call requiring a job queue lock, and if we still can't acquire the
    lock, we raise an exception (which is catched in HandleRequest and
    transformed into an RPC failure).
    
    This allows the node daemon to start in face of queue issues, and the
    master node to power-cycle it.
    
    Signed-off-by: default avatarIustin Pop <iustin@google.com>
    Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
    81198f6e