diff --git a/Makefile.am b/Makefile.am
index 4bd4bdd340928dd574070654588607c3a39fe22f..5b59bd8d2bd7f653fbc524e23dc1a51e65fad6da 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -371,6 +371,7 @@ docrst = \
 	doc/design-node-add.rst \
 	doc/design-oob.rst \
 	doc/design-ovf-support.rst \
+	doc/design-opportunistic-locking.rst \
 	doc/design-partitioned.rst \
 	doc/design-query-splitting.rst \
 	doc/design-query2.rst \
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index a76822eb922b8a135dd7dd4e9fbcf226b4d039d7..f659323962ac9b8c6aac92398faf0e6d907f508f 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -18,6 +18,7 @@ Design document drafts
    design-monitoring-agent.rst
    design-remote-commands.rst
    design-linuxha.rst
+   design-opportunistic-locking.rst
 
 .. vim: set textwidth=72 :
 .. Local Variables:
diff --git a/doc/design-opportunistic-locking.rst b/doc/design-opportunistic-locking.rst
new file mode 100644
index 0000000000000000000000000000000000000000..cd3da444de87a7cec2cca372cabd04def29b590e
--- /dev/null
+++ b/doc/design-opportunistic-locking.rst
@@ -0,0 +1,131 @@
+Design for parallelized instance creations and opportunistic locking
+====================================================================
+
+.. contents:: :depth: 3
+
+
+Current state and shortcomings
+------------------------------
+
+As of Ganeti 2.6, instance creations acquire all node locks when an
+:doc:`instance allocator <iallocator>` (henceforth "iallocator") is
+used. In situations where many instance should be created in a short
+timeframe, there is a lot of congestion on node locks. Effectively all
+instance creations are serialized, even on big clusters with multiple
+groups.
+
+The situation gets worse when disk wiping is enabled (see
+:manpage:`gnt-cluster(8)`) as that can take, depending on disk size and
+hardware performance, from minutes to hours. Not waiting for DRBD disks
+to synchronize (``wait_for_sync=false``) makes instance creations
+slightly faster, but there's a risk of impacting I/O of other instances.
+
+
+Proposed changes
+----------------
+
+The target is to speed up instance creations in combination with an
+iallocator even when the cluster's balance is sacrificed in the process.
+The cluster can later be re-balanced using ``hbal``. The main objective
+is to reduce the number of node locks acquired for creation and to
+release un-used locks as fast as possible (the latter is already being
+done). To do this safely, several changes are necessary.
+
+Locking library
+~~~~~~~~~~~~~~~
+
+Instead of forcibly acquiring all node locks for creating an instance
+using an iallocator, only those currently available will be acquired.
+
+To this end, the locking library must be extended to implement
+opportunistic locking. Lock sets must be able to only acquire all locks
+available at the time, ignoring and not waiting for those held by
+another thread.
+
+Locks (``SharedLock``) already support a timeout of zero. The latter is
+different from a blocking acquisition, in which case the timeout would
+be ``None``.
+
+Lock sets can essentially be acquired in two different modes. One is to
+acquire the whole set, which in turn will also block adding new locks
+from other threads, and the other is to acquire specific locks by name.
+The function to acquire locks in a set accepts a timeout which, if not
+``None`` for blocking acquisitions, counts for the whole duration of
+acquiring, if necessary, the lock set's internal lock, as well as the
+member locks. For opportunistic acquisitions the timeout is only
+meaningful when acquiring the whole set, in which case it is only used
+for acquiring the set's internal lock (used to block lock additions).
+For acquiring member locks the timeout is effectively zero to make them
+opportunistic.
+
+A new and optional boolean parameter named ``opportunistic`` is added to
+``LockSet.acquire`` and re-exported through
+``GanetiLockManager.acquire`` for use by ``mcpu``. Internally, lock sets
+do the lock acquisition using a helper function, ``__acquire_inner``. It
+will be extended to support opportunistic acquisitions. The algorithm is
+very similar to acquiring the whole set with the difference that
+acquisitions timing out will be ignored (the timeout in this case is
+zero).
+
+
+New lock level
+~~~~~~~~~~~~~~
+
+With opportunistic locking used for instance creations (controlled by a
+parameter), multiple such requests can start at (essentially) the same
+time and compete for node locks. Some logical units, such as
+``LUClusterVerifyGroup``, need to acquire all node locks. In the latter
+case all instance allocations would fail to get their locks. This also
+applies when multiple instance creations are started at roughly the same
+time.
+
+To avoid situations where an opcode holding all or many node locks
+causes allocations to fail, a new lock level must be added to control
+allocations. The logical units for instance failover and migration can
+only safely determine whether they need all node locks after the
+instance lock has been acquired. Therefore the new lock level, named
+"node-alloc" (shorthand for "node-allocation") will be inserted after
+instances (``LEVEL_INSTANCE``) and before node groups
+(``LEVEL_NODEGROUP``). Similar to the "big cluster lock" ("BGL") there
+is only a single lock at this level whose name is "node allocation lock"
+("NAL").
+
+As a rule-of-thumb, the node allocation lock must be acquired in the
+same mode as nodes and/or node resources. If all or a large number of
+node locks are acquired, the node allocation lock should be acquired as
+well. Special attention should be given to logical units started for all
+node groups, such as ``LUGroupVerifyDisks``, as they also block many
+nodes over a short amount of time.
+
+
+iallocator
+~~~~~~~~~~
+
+The :doc:`iallocator interface <iallocator>` does not need any
+modification. When an instance is created, the information for all nodes
+is passed to the iallocator plugin. Nodes for which the lock couldn't be
+acquired and therefore shouldn't be used for the instance in question,
+will be shown as offline.
+
+
+Opcodes
+~~~~~~~
+
+The opcodes ``OpInstanceCreate`` and ``OpInstanceMultiAlloc`` will gain
+a new parameter to enable opportunistic locking. By default this mode is
+disabled as to not break backwards compatibility.
+
+A new error type is added to describe a temporary lack of resources. Its
+name will be ``ECODE_TEMP_NORES``. With opportunistic locks the opcodes
+mentioned before only have a partial view of the cluster and can no
+longer decide if an instance could not be allocated due to the locks it
+has been given or whether the whole cluster is lacking resources.
+Therefore it is required, upon encountering the error code for a
+temporary lack of resources, for the job submitter to make this decision
+by re-submitting the job or by re-directing it to another cluster.
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End: