Stop acquiring BGL for LUXI queries

Short description: This fixes an issue whereby masterd would become unresponsive on the LUXI socket, leading to client timeouts. While made worse in 2.5, the underlying issue was already present in 2.4. Longer description: Until now all LUXI queries would acquire the BGL (big Ganeti lock) in shared mode. With the exception of OpNodeAdd and OpNodeRemove, this was also the case for all opcodes before version 2.5. In 2.5 we split OpClusterVerify into multiple opcodes, one of which (OpClusterVerifyConfig) now acquires the BGL in exclusive mode. Whether or not doing so is good is a separate discussion: OpNodeAdd and OpNodeRemove, as of this writing, still require an exclusive BGL. OpClusterVerifyConfig is run more often than OpNodeAdd or OpNodeRemove in normal clusters, which is why we only recognized this issue in 2.5. What would happen is that once OpClusterVerifyConfig tried to acquire its exclusive BGL while it was actually held by other opcodes (e.g. OpInstanceReplaceDisks), the locking code would not grant shared acquires for the BGL, even when the exclusive acquire is removed from the queue for a short amount of time after a timeout. This is necessary to prevent lock starvation. In this situation further LUXI queries requiring the BGL in shared mode, e.g. OpClusterQuery, would block and the client eventually time out. Over time they fill the client request workerpool's queue and at that point even requests not requiring the BGL stop working. Once the long-running operation(s) holding the BGL in shared mode finished, OpClusterVerifyConfig gets it in exclusive mode and everything returns to normal. LUXI recovers very soon too. I'd like to thank Bernardo Dal Seno for his contribution to this bugfix. Signed-off-by: Michael Hanselmann <hansmi@google.com> Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

Stop acquiring BGL for LUXI queries
Short description: This fixes an issue whereby masterd would become unresponsive on the LUXI socket, leading to client timeouts. While made worse in 2.5, the underlying issue was already present in 2.4. Longer description: Until now all LUXI queries would acquire the BGL (big Ganeti lock) in shared mode. With the exception of OpNodeAdd and OpNodeRemove, this was also the case for all opcodes before version 2.5. In 2.5 we split OpClusterVerify into multiple opcodes, one of which (OpClusterVerifyConfig) now acquires the BGL in exclusive mode. Whether or not doing so is good is a separate discussion: OpNodeAdd and OpNodeRemove, as of this writing, still require an exclusive BGL. OpClusterVerifyConfig is run more often than OpNodeAdd or OpNodeRemove in normal clusters, which is why we only recognized this issue in 2.5. What would happen is that once OpClusterVerifyConfig tried to acquire its exclusive BGL while it was actually held by other opcodes (e.g. OpInstanceReplaceDisks), the locking code would not grant shared acquires for the BGL, even when the exclusive acquire is removed from the queue for a short amount of time after a timeout. This is necessary to prevent lock starvation. In this situation further LUXI queries requiring the BGL in shared mode, e.g. OpClusterQuery, would block and the client eventually time out. Over time they fill the client request workerpool's queue and at that point even requests not requiring the BGL stop working. Once the long-running operation(s) holding the BGL in shared mode finished, OpClusterVerifyConfig gets it in exclusive mode and everything returns to normal. LUXI recovers very soon too. I'd like to thank Bernardo Dal Seno for his contribution to this bugfix. Signed-off-by: Michael Hanselmann <hansmi@google.com> Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
0fa753ba · Michael Hanselmann · 666e013f · 0fa753ba · 0fa753ba
Commit 0fa753ba authored 13 years ago by Michael Hanselmann
--- a/lib/mcpu.py
+++ b/lib/mcpu.py
@@ -176,7 +176,7 @@ class Processor(object):
  """Object which runs OpCodes"""
  DISPATCH_TABLE = _ComputeDispatchTable()

-  def __init__(self, context, ec_id):
+  def __init__(self, context, ec_id, enable_locks=True):
    """Constructor for Processor

    @type context: GanetiContext
@@ -190,6 +190,16 @@ class Processor(object):
    self._cbs = None
    self.rpc = rpc.RpcRunner(context.cfg)
    self.hmclass = HooksMaster
+    self._enable_locks = enable_locks
+
+  def _CheckLocksEnabled(self):
+    """Checks if locking is enabled.
+
+    @raise errors.ProgrammerError: In case locking is not enabled
+
+    """
+    if not self._enable_locks:
+      raise errors.ProgrammerError("Attempted to use disabled locks")

  def _AcquireLocks(self, level, names, shared, timeout, priority):
    """Acquires locks via the Ganeti lock manager.
@@ -206,6 +216,8 @@ class Processor(object):
        amount of time

    """
+    self._CheckLocksEnabled()
+
    if self._cbs:
      self._cbs.CheckCancel()

@@ -291,6 +303,8 @@ class Processor(object):
                                " others")

    elif adding_locks or acquiring_locks:
+      self._CheckLocksEnabled()
+
      lu.DeclareLocks(level)
      share = lu.share_locks[level]

@@ -361,12 +375,17 @@ class Processor(object):

    self._cbs = cbs
    try:
-      # Acquire the Big Ganeti Lock exclusively if this LU requires it,
-      # and in a shared fashion otherwise (to prevent concurrent run with
-      # an exclusive LU.
-      self._AcquireLocks(locking.LEVEL_CLUSTER, locking.BGL,
-                          not lu_class.REQ_BGL, calc_timeout(),
-                          priority)
+      if self._enable_locks:
+        # Acquire the Big Ganeti Lock exclusively if this LU requires it,
+        # and in a shared fashion otherwise (to prevent concurrent run with
+        # an exclusive LU.
+        self._AcquireLocks(locking.LEVEL_CLUSTER, locking.BGL,
+                            not lu_class.REQ_BGL, calc_timeout(),
+                            priority)
+      elif lu_class.REQ_BGL:
+        raise errors.ProgrammerError("Opcode '%s' requires BGL, but locks are"
+                                     " disabled" % op.OP_ID)
+
      try:
        lu = lu_class(self, op, self.context, self.rpc)
        lu.ExpandNames()
@@ -379,7 +398,10 @@ class Processor(object):
          if self._ec_id:
            self.context.cfg.DropECReservations(self._ec_id)
      finally:
-        self.context.glm.release(locking.LEVEL_CLUSTER)
+        # Release BGL if owned
+        if self.context.glm.is_owned(locking.LEVEL_CLUSTER):
+          assert self._enable_locks
+          self.context.glm.release(locking.LEVEL_CLUSTER)
    finally:
      self._cbs = None


--- a/lib/server/masterd.py
+++ b/lib/server/masterd.py
@@ -361,7 +361,7 @@ class ClientOps:

    """
    # Queries don't have a job id
-    proc = mcpu.Processor(self.server.context, None)
+    proc = mcpu.Processor(self.server.context, None, enable_locks=False)

    # TODO: Executing an opcode using locks will acquire them in blocking mode.
    # Consider using a timeout for retries.