From ae9b5e0f77ce3ada7a3b6741e95024e6d1d35679 Mon Sep 17 00:00:00 2001
From: Tsachy Shacham <>
Date: Wed, 18 May 2011 19:00:00 +0200
Subject: [PATCH] Design doc for CPU pinning

Signed-off-by: Tsachy Shacham <>
Reviewed-by: Michael Hanselmann <>
Reviewed-by: Iustin Pop <>
---                |   1 +
 doc/design-cpu-pinning.rst | 225 +++++++++++++++++++++++++++++++++++++
 doc/design-draft.rst       |   1 +
 3 files changed, 227 insertions(+)
 create mode 100644 doc/design-cpu-pinning.rst

diff --git a/ b/
index 62e259de1..2c0b25db3 100644
--- a/
+++ b/
@@ -278,6 +278,7 @@ docrst = \
 	doc/design-2.4.rst \
 	doc/design-draft.rst \
 	doc/design-oob.rst \
+	doc/design-cpu-pinning.rst \
 	doc/design-query2.rst \
 	doc/design-x509-ca.rst \
 	doc/design-http-server.rst \
diff --git a/doc/design-cpu-pinning.rst b/doc/design-cpu-pinning.rst
new file mode 100644
index 000000000..f1b3de140
--- /dev/null
+++ b/doc/design-cpu-pinning.rst
@@ -0,0 +1,225 @@
+Ganeti CPU Pinning
+This document defines Ganeti's support for CPU pinning (aka CPU
+CPU pinning enables mapping and unmapping entire virtual machines or a
+specific virtual CPU (vCPU), to a physical CPU or a range of CPUs.
+At this stage Pinning will be implemented for Xen and KVM.
+Command Line
+Suggested command line parameters for controlling CPU pinning are as
+  gnt-instance modify -H cpu_mask=<cpu-pinning-info> <instance>
+cpu-pinning-info can be any of the following:
+* One vCPU mapping, which can be the word "all" or a combination
+  of CPU numbers and ranges separated by comma. In this case, all
+  vCPUs will be mapped to the indicated list.
+* A list of vCPU mappings, separated by a colon ':'. In this case
+  each vCPU is mapped to an entry in the list, and the size of the
+  list must match the number of vCPUs defined for the instance. This
+  is enforced when setting CPU pinning or when setting the number of
+  vCPUs using ``-B vcpus=#``.
+  The mapping list is matched to consecutive virtual CPUs, so the first entry
+  would be the CPU pinning information for vCPU 0, the second entry
+  for vCPU 1, etc.
+The default setting for new instances is "all", which maps the entire
+instance to all CPUs, thus effectively turning off CPU pinning.
+Here are some usage examples::
+  # Map vCPU 0 to physical CPU 1 and vCPU 1 to CPU 3 (assuming 2 vCPUs)
+  gnt-instance modify -H cpu_mask=1:3 my-inst
+  # Pin vCPU 0 to CPUs 1 or 2, and vCPU 1 to any CPU
+  gnt-instance modify -H cpu_mask=1-2:all my-inst
+  # Pin vCPU 0 to any CPU, vCPU 1 to CPUs 1, 3, 4 or 5, and CPU 2 to
+  # CPU 0
+  gnt-instance modify -H cpu_mask=all:1\\,3-4:0 my-inst
+  # Pin entire VM to CPU 0
+  gnt-instance modify -H cpu_mask=0 my-inst
+  # Turn off CPU pinning (default setting)
+  gnt-instance modify -H cpu_mask=all my-inst
+Assuming an instance has 2 vCPUs, the following commands will fail::
+  # not enough mappings
+  gnt-instance modify -H cpu_mask=0 my-inst
+  # too many
+  gnt-instance modify -H cpu_mask=2:1:1 my-inst
+CPU pinning information is validated by making sure it matches the
+number of vCPUs. This validation happens when changing either the
+cpu_mask or vcpus parameters.
+Changing either parameter in a way that conflicts with the other will
+fail with a proper error message.
+To make such a change, both parameters should be modified at the same
+time. For example:
+``gnt-instance modify -B vcpus=4 -H cpu_mask=1:1:2-3:4\\,6 my-inst``
+Besides validating CPU configuration, i.e. the number of vCPUs matches
+the requested CPU pinning, Ganeti will also verify the number of
+physical CPUs is enough to support the required configuration. For
+example, trying to run a configuration of vcpus=2,cpu_mask=0:4 on
+a node with 4 cores will fail (Note: CPU numbers are 0-based).
+This validation should repeat every time an instance is started or
+migrated live. See more details under Migration below.
+Cluster verification should also test the compatibility of other nodes in
+the cluster to required configuration and alert if a minimum requirement
+is not met.
+CPU pinning configuration can be transferred from node to node, unless
+the number of physical CPUs is smaller than what the configuration calls
+for.  It is suggested that unless this is the case, all transfers and
+migrations will succeed.
+In case the number of physical CPUs is smaller than the numbers
+indicated by CPU pinning information, instance failover will fail.
+In case of emergency, to force failover to ignore mismatching CPU
+information, the following switch can be used:
+``gnt-instance failover --ignore-cpu-mismatch my-inst``.
+This command will try to fail the instance with the current cpu mask,
+but if that fails, it will change the mask to be "all".
+In case of live migration, and in addition to failover considerations,
+it is required to remap CPU pinning after migration. This can be done in
+realtime for instances for both Xen and KVM, and only depends on the
+number of physical CPUs being sufficient to support the migrated
+Pinning information will be kept as a list of integers per vCPU.
+To mark a mapping of any CPU, we will use (-1).
+A single entry, no matter what the number of vCPUs is, will always mean
+that all vCPUs have the same mapping.
+Configuration file
+The pinning information is kept for each instance's hypervisor
+params section of the configuration file as
+``cpu_mask: [ [ a ], [ b, c ], [ d ] ]``
+There are 2 ways to control pinning in Xen, either via the command line
+or through the configuration file.
+The commands to make direct pinning changes are the following::
+  # To pin a vCPU to a specific CPU
+  xm vcpu-pin <domain> <vcpu> <cpu>
+  # To unpin a vCPU
+  xm vcpu-pin <domain> <vcpu> all
+  # To get the current pinning status
+  xm vcpu-list <domain>
+Since currently controlling Xen in Ganeti is done in the configuration
+file, it is straight forward to use the same method for CPU pinning.
+There are 2 different parameters that control Xen's CPU pinning and
+  controls the number of vCPUs
+  maps vCPUs to physical CPUs
+When no pinning is required (pinning information is "all"), the
+"cpus" entry is removed from the configuration file.
+For all other cases, the configuration is "translated" to Xen, which
+expects either ``cpus = "a"`` or ``cpus = [ "a", "b", "c", ...]``,
+where each a, b or c are a physical CPU number, CPU range, or a
+combination, and the number of entries (if a list is used) must match
+the number of vCPUs, and are mapped in order.
+For example, CPU pinning information of ``1:2,4-7:0-1`` is translated
+to this entry in Xen's configuration ``cpus = [ "1", "2,4-7", "0-1" ]``
+Controlling pinning in KVM is a little more complicated as there is no
+configuration to control pinning before instances are started.
+The way to change or assign CPU pinning under KVM is to use ``taskset`` or
+its underlying system call ``sched_setaffinity``. Setting the affinity for
+the VM process will change CPU pinning for the entire VM, and setting it
+for specific vCPU threads will control specific vCPUs.
+The sequence of commands to control pinning is this: start the instance
+with the ``-S`` switch, so it halts before starting execution, get the
+process ID or identify thread IDs of each vCPU by sending ``info cpus``
+to the monitor, map vCPUs as required by the cpu-pinning information,
+and issue a ``cont`` command on the KVM monitor to allow the instance
+to start execution.
+For example, a sequence of commands to control CPU affinity under KVM
+may be:
+* Start KVM: ``/usr/bin/kvm … <kvm-command-line-options> … -S``
+* Use socat to connect to monitor
+* send ``info cpus`` to monitor to get thread/vCPU information
+* call ``sched_setaffinity`` for each thread with the CPU mask
+* send ``cont`` to KVM's monitor
+A CPU mask is a hexadecimal bit mask where each bit represents one
+physical CPU. See man page for :manpage:`sched_setaffinity(2)` for more
+For example, to run a specific thread-id on CPUs 1 or 3 the mask is
+We will control process and thread affinity using the python affinity
+package ( This package is a Python
+wrapper around the two affinity system calls, and has no other
+Alternative Design Options
+1. There's an option to ignore the limitations of the underlying
+   hypervisor and instead of requiring explicit pinning information
+   for *all* vCPUs, assume a mapping of "all" to vCPUs not mentioned.
+   This can lead to inadvertent missing information, but either way,
+   since using cpu-pinning options is probably not going to be
+   frequent, there's no real advantage.
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End:
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index 2f736ca5b..63759db3f 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -10,6 +10,7 @@ Design document drafts
+   design-cpu-pinning.rst
 .. vim: set textwidth=72 :
 .. Local Variables: