Commit 333bd799 authored by Jose A. Lopes's avatar Jose A. Lopes

Design document for KVM daemon

Design document for KVM daemon which is needed by the instance
shutdown detection for KVM.
Signed-off-by: default avatarJose A. Lopes <jabolopes@google.com>
Reviewed-by: default avatarMichele Tartara <mtartara@google.com>
parent eea5e916
......@@ -531,6 +531,7 @@ docinput = \
doc/design-hugepages-support.rst \
doc/design-impexp2.rst \
doc/design-internal-shutdown.rst \
doc/design-kvmd.rst \
doc/design-linuxha.rst \
doc/design-lu-generated-jobs.rst \
doc/design-monitoring-agent.rst \
......
==========
KVM daemon
==========
.. toctree::
:maxdepth: 2
This design document describes the KVM daemon, which is responsible for
determining whether a given KVM instance was shutdown by an
administrator or a user.
Current state and shortcomings
==============================
This design document describes the KVM daemon which addresses the KVM
side of the user-initiated shutdown problem introduced in
:doc:`design-internal-shutdown`. We are also interested in keeping this
functionality optional. That is, an administrator does not necessarily
have to run the KVM daemon if either he is running Xen or even, if he
is running KVM, he is not interested in instance shutdown detection.
This requirement is important because it means the KVM daemon should
be a modular component in the overall Ganeti design, i.e., it should
be easy to enable and disable it.
Proposed changes
================
The instance shutdown feature for KVM requires listening on events from
the Qemu Machine Protocol (QMP) Unix socket, which is created together
with a KVM instance. A QMP socket typically looks like
``/var/run/ganeti/kvm-hypervisor/ctrl/<instance>.qmp`` and implements
the QMP protocol. This is a bidirectional protocol that allows Ganeti
to send commands, such as, system powerdown, as well as, receive events,
such as, the powerdown and shutdown events.
Listening in on these events allows Ganeti to determine whether a given
KVM instance was shutdown by an administrator, either through
``gnt-instance stop|remove <instance>`` or ``kill -KILL
<instance-pid>``, or by a user, through ``poweroff`` from inside the
instance. Upon an administrator powerdown, the QMP protocol sends two
events, namely, a powerdown event and a shutdown event, whereas upon a
user shutdown only the shutdown event is sent. This is enough to
distinguish between an administrator and a user shutdown. However,
there is one limitation, which is, ``kill -TERM <instance-pid>``. Even
though this is an action performed by the administrator, it will be
considered a user shutdown by the approach described in this document.
Several design strategies were considered. Most of these strategies
consisted of spawning some process listening on the QMP socket when a
KVM instance is created. However, having a listener process per KVM
instance is not scalable. Therefore, a different strategy is proposed,
namely, having a single process, called the KVM daemon, listening on the
QMP sockets of all KVM instances within a node. That also means there
is an instance of the KVM daemon on each node.
In order to implement the KVM daemon, two problems need to be addressed,
namely, how the KVM daemon knows when to open a connection to a given
QMP socket and how the KVM daemon communicates with Ganeti whether a
given instance was shutdown by an administrator or a user.
QMP connections management
--------------------------
As mentioned before, the QMP sockets reside in the KVM control
directory, which is usually located under
``/var/run/ganeti/kvm-hypervisor/ctrl/``. When a KVM instance is
created, a new QMP socket for this instance is also created in this
directory.
In order to simplify the design of the KVM daemon, instead of having
Ganeti communicate to this daemon through a pipe or socket the creation
of a new KVM instance, and thus a new QMP socket, this daemon will
monitor the KVM control directory using ``inotify``. As a result, the
daemon is not only able to deal with KVM instances being created and
removed, but also capable of overcoming other problematic situations
concerning the filesystem, such as, the case when the KVM control
directory does not exist because, for example, Ganeti was not yet
started, or the KVM control directory was removed, for example, as a
result of a Ganeti reinstallation.
Shutdown detection
------------------
As mentioned before, the KVM daemon is responsible for opening a
connection to the QMP socket of a given instance and listening in on the
shutdown and powerdown events, which allow the KVM daemon to determine
whether the instance stopped because of an administrator or user
shutdown. Once the instance is stopped, the KVM daemon needs to
communicate to Ganeti whether the user was responsible for shutting down
the instance.
In order to achieve this, the KVM daemon writes an empty file, called
the shutdown file, in the KVM control directory with a name similar to
the QMP socket file but with the extension ``.qmp`` replaced with
``.shutdown``. The presence of this file indicates that the shutdown
was initiated by a user, whereas the absence of this file indicates that
the shutdown was caused by an administrator. This strategy also handles
crashes and signals, such as, ``SIGKILL``, to be handled correctly,
given that in these cases the KVM daemon never receives the powerdown
and shutdown events and, therefore, never creates the shutdown file.
KVM daemon launch
-----------------
With the above issues addressed, a question remains as to when the KVM
daemon should be started. The KVM daemon is different from other Ganeti
daemons, which start together with the Ganeti service, because the KVM
daemon is optional, given that it is specific to KVM and should not be
run on installations containing only Xen, and, even in a KVM
installation, the user might still choose not to enable it. And finally
because the KVM daemon is not really necessary until the first KVM
instance is started. For these reasons, the KVM daemon is started from
within Ganeti when a KVM instance is started. And the job process
spawned by the node daemon is responsible for starting the KVM daemon.
Given the current design of Ganeti, in which the node daemon spawns a
job process to handle the creation of the instance, when launching the
KVM daemon it is necessary to first check whether an instance of this
daemon is already running and, if this is not the case, then the KVM
daemon can be safely started.
Design alternatives
===================
At first, it might seem natural to include the instance shutdown
detection for KVM in the node daemon. After all, the node daemon is
already responsible for managing instances, for example, starting and
stopping an instance. Nevertheless, the node daemon is more complicated
than it might seem at first.
The node daemon is composed of the main loop, which runs in the main
thread and is responsible for receiving requests and spawning jobs for
handling these requests, and the jobs, which are independent processes
spawned for executing the actual tasks, such as, creating an instance.
Including instance shutdown detection in the node daemon is not viable
because adding it to the main loop would cause KVM specific code to
taint the generality of the node daemon. In order to add it to the job
processes, it would be possible to spawn either a foreground or a
background process. However, these options are also not viable because
they would lead to the situation described before where there would be a
monitoring process per instance, which is not scalable. Moreover, the
foreground process has an additional disadvantage: it would require
modifications the node daemon in order not to expect a terminating job,
which is the current node daemon design.
There is another design issue to have in mind. We could reconsider the
place where to write the data that tell Ganeti whether an instance was
shutdown by an administrator or the user. Instead of using the KVM
shutdown files presented above, in which the presence of the file
indicates a user shutdown and its absence an administrator shutdown, we
could store a value in the KVM runtime state file, which is where the
relevant KVM state information is. The advantage of this approach is
that it would keep the KVM related information in one place, thus making
it easier to manage. However, it would lead to a more complex
implementation and, in the context of the general transition in Ganeti
from Python to Haskell, a simpler implementation is preferred.
Finally, it should be noted that the KVM runtime state file benefits
from automatic migration. That is, when an instance is migrated so is
the KVM state file. However, the instance shutdown detection for KVM
does not require this feature and, in fact, migrating the instance
shutdown state would be incorrect.
Further considerations
======================
There are potential race conditions between Ganeti and the KVM daemon,
however, in practice they seem unlikely. For example, the KVM daemon
needs to add and remove watches to the parent directories of the KVM
control directory until this directory is finally created. It is
possible that Ganeti creates this directory and a KVM instance before
the KVM daemon has a chance to add a watch to the KVM control directory,
thus causing this daemon to miss the ``inotify`` creation event for the
QMP socket.
There are other problems which arise from the limitations of
``inotify``. For example, if the KVM daemon is started after the first
Ganeti instance has been created, then the ``inotify`` will not produce
any event for the creation of the QMP socket. This can happen, for
example, if the KVM daemon needs to be restarted or upgraded. As a
result, it might be necessary to have an additional mechanism that runs
at KVM daemon startup or at regular intervals to ensure that the current
KVM internal state is consistent with the actual contents of the KVM
control directory.
Another race condition occurs when Ganeti shuts down a KVM instance
using force. Ganeti uses ``TERM`` signals to stop KVM instances when
force is specified or ACPI is not enabled. However, as mentioned
before, ``TERM`` signals are interpreted by the KVM daemon as a user
shutdown. As a result, the KVM daemon creates a shutdown file which
then must be removed by Ganeti. The race condition occurs because the
KVM daemon might create the shutdown file after the hypervisor code that
tries to remove this file has already run. In practice, the race
condition seems unlikely because Ganeti stops the KVM instance in a
retry loop, which allows Ganeti to stop the instance and cleanup its
runtime information.
It is possible to determine if a process, in this particular case the
KVM process, was terminated by a ``TERM`` signal, using the `proc
connector and socket filters
<https://web.archive.org/web/20121025062848/http://netsplit.com/2011/02/09/the-proc-connector-and-socket-filters/>`_.
The proc connector is a socket connected between a userspace process and
the kernel through the netlink protocol and can be used to receive
notifications of process events, and the socket filters is a mechanism
for subscribing only to events that are relevant. There are several
`process events <http://lwn.net/Articles/157150/>`_ which can be
subscribed to, however, in this case, we are interested only in the exit
event, which carries information about the exit signal.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
......@@ -116,6 +116,7 @@ Draft designs
design-device-uuid-name.rst
design-hroller.rst
design-hotplug.rst
design-kvmd.rst
design-linuxha.rst
design-lu-generated-jobs.rst
design-monitoring-agent.rst
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment