Commit 982dc0e0 authored by Klaus Aehlig's avatar Klaus Aehlig
Browse files

Merge branch 'stable-2.9' into master

* stable-2.9
  Version bump for 2.9.0 rc1
  Update NEWS for 2.9.0 rc1
  configure: check for hslogger
  Document hslogger dependency in NEWS
  Update INSTALL: hslogger is mandatory
  Update installation instruction for Debian
  Add documentation for diskstats collector
  Make the inst-status-xen collector more resilient
  Use secondary IP when moving instances

* stable-2.8
  Fix wrong release date in the NEWS file
  Version bump for 2.8.0
  Add daemon split design doc

* stable-2.7
  Version bump for 2.7.2

	doc/design-draft.rst ignore version bump on stable-2.9
rest: take both additions.
Signed-off-by: default avatarKlaus Aehlig <>
Reviewed-by: default avatarGuido Trotter <>
parents 5c40076b 8451bfdb
......@@ -57,21 +57,22 @@ Debian/Ubuntu, you can use this command line to install all required
packages, except for RBD, DRBD and Xen::
$ apt-get install lvm2 ssh bridge-utils iproute iputils-arping make \
ndisc6 python python-pyopenssl openssl \
ndisc6 python python-openssl openssl \
python-pyparsing python-simplejson python-bitarray \
python-pyinotify python-pycurl python-ipaddr socat fping
If bitarray is missing it can be installed from easy-install::
$ easy_install bitarray
Or on newer distributions (eg. Debian Wheezy) the above becomes::
For older distributions (eg. Debian Squeeze) the package names are
$ apt-get install lvm2 ssh bridge-utils iproute iputils-arping make \
ndisc6 python python-openssl openssl \
ndisc6 python python-pyopenssl openssl \
python-pyparsing python-simplejson python-bitarray \
python-pyinotify python-pycurl python-ipaddr socat fping
If bitarray is missing it can be installed from easy-install::
$ easy_install bitarray
Note that this does not install optional packages::
$ apt-get install python-paramiko python-affinity qemu-img
......@@ -145,19 +146,22 @@ deploy Ganeti on production machines). More specifically:
- `deepseq <>`_
- `curl <>`_, tested with
versions 1.3.4 and above
- `hslogger <>`_, version 1.1 and
above (note that Debian Squeeze only has version 1.0.9)
Some of these are also available as package in Debian/Ubuntu::
$ apt-get install ghc libghc-json-dev libghc-network-dev \
libghc-parallel-dev libghc-deepseq-dev \
libghc-utf8-string-dev libghc-curl-dev \
Or in older versions of these distributions (using GHC 6.x)::
$ apt-get install ghc6 libghc6-json-dev libghc6-network-dev \
libghc6-parallel-dev libghc6-deepseq-dev \
Or in newer versions of these distributions (using GHC 7.x)::
$ apt-get install ghc libghc-json-dev libghc-network-dev \
libghc-parallel-dev libghc-deepseq-dev \
libghc-utf8-string-dev libghc-curl-dev
In Fedora, some of them are available via packages as well::
$ yum install ghc ghc-json-devel ghc-network-devel \
......@@ -172,7 +176,7 @@ the Haskell platform. You can also install ``cabal`` manually::
Then install the additional libraries (only the ones not available in your
distribution packages) via ``cabal``::
$ cabal install json network parallel utf8-string curl
$ cabal install json network parallel utf8-string curl hslogger
Haskell optional features
......@@ -182,8 +186,6 @@ a few more Haskell libraries enabled: the ``ganeti-confd`` and
``ganeti-luxid`` daemon (``--enable-confd``) and the monitoring daemon
(``--enable-mond``). The list of extra dependencies for these is:
- `hslogger <>`_, version 1.1 and
above (note that Debian Squeeze only has version 1.0.9)
- `Crypto <>`_, tested with
version 4.2.4
- `text <>`_
......@@ -201,7 +203,7 @@ a few more Haskell libraries enabled: the ``ganeti-confd`` and
These libraries are available in Debian Wheezy (but not in Squeeze), so you
can use either apt::
$ apt-get install libghc-hslogger-dev libghc-crypto-dev libghc-text-dev \
$ apt-get install libghc-crypto-dev libghc-text-dev \
libghc-hinotify-dev libghc-regex-pcre-dev \
libghc-attoparsec-dev libghc-vector-dev \
......@@ -209,7 +211,7 @@ can use either apt::
or ``cabal``, after installing a required non-Haskell dependency::
$ apt-get install libpcre3-dev libcurl4-openssl-dev
$ cabal install hslogger Crypto text hinotify==0.3.2 regex-pcre \
$ cabal install Crypto text hinotify==0.3.2 regex-pcre \
attoparsec vector snap-server
to install them.
......@@ -510,6 +510,7 @@ docinput = \
doc/design-draft.rst \
doc/design-hotplug.rst \
doc/design-glusterfs-ganeti-support.rst \
doc/design-daemons.rst \
doc/design-htools-2.3.rst \
doc/design-http-server.rst \
doc/design-impexp2.rst \
......@@ -14,10 +14,10 @@ Incompatible/important changes
default. Specify --no-wait-for-sync to override this behavior.
Version 2.9.0 beta1
Version 2.9.0 rc1
*(Released Thu, 29 Aug 2013)*
*(Released Tue, 1 Oct 2013)*
Incompatible/important changes
......@@ -78,11 +78,33 @@ Python
- ``python-mock`` ( is now a required
for the unit tests (and only used for testing).
Version 2.8.0 rc3
- ``hslogger`` ( is now always
required, even if confd is not enabled.
Since 2.9.0 beta1
- various bug fixes
- update of the documentation, in particular installation instructions
- merging of LD_* constants into DT_* constants
- python style changes to be compatible with newer versions of pylint
Version 2.9.0 beta1
*(Released Thu, 29 Aug 2013)*
This was the first beta release of the 2.9 series. All important changes
are listed in the latest 2.9 entry.
*(Released Tue, 17 Sep 2013)*
Version 2.8.0
*(Released Mon, 30 Sep 2013)*
Incompatible/important changes
......@@ -172,8 +194,16 @@ For Python:
- The minimum Python version needed to run Ganeti is now 2.6.
- ``yaml`` library (only for running the QA).
Since 2.8.0 rc2
Since 2.8.0 rc3
- Perform proper cleanup on termination of Haskell daemons
- Fix corner-case in handling of remaining retry time
Version 2.8.0 rc3
*(Released Tue, 17 Sep 2013)*
- To simplify the work of packaging frameworks that want to add the needed users
and groups in a split-user setup themselves, at build time three files in
......@@ -239,6 +269,19 @@ This was the first beta release of the 2.8 series. All important changes
are listed in the latest 2.8 entry.
Version 2.7.2
*(Released Thu, 26 Sep 2013)*
- Change the connected groups format in ``gnt-network info`` output; it
was previously displayed as a raw list by mistake
- Check disk template in right dict when copying
- Support multi-instance allocs without iallocator
- Fix some errors in the documentation
- Fix formatting of tuple in an error message
Version 2.7.1
......@@ -528,6 +528,7 @@ AC_GHC_PKG_REQUIRE(network)
# extra modules for confd functionality
......@@ -536,7 +537,6 @@ if test "$enable_confd" != no; then
AC_GHC_PKG_CHECK([regex-pcre], [HS_REGEX_PCRE=],
[CONFD_PKG="$CONFD_PKG regex-pcre"])
AC_GHC_PKG_CHECK([hslogger], [], [CONFD_PKG="$CONFD_PKG hslogger"])
AC_GHC_PKG_CHECK([Crypto], [], [CONFD_PKG="$CONFD_PKG Crypto"])
AC_GHC_PKG_CHECK([text], [], [CONFD_PKG="$CONFD_PKG text"])
AC_GHC_PKG_CHECK([hinotify], [], [CONFD_PKG="$CONFD_PKG hinotify"])
Ganeti daemons refactoring
.. contents:: :depth: 2
This is a design document detailing the plan for refactoring the internal
structure of Ganeti, and particularly the set of daemons it is divided into.
Current state and shortcomings
Ganeti is comprised of a growing number of daemons, each dealing with part of
the tasks the cluster has to face, and communicating with the other daemons
using a variety of protocols.
Specifically, as of Ganeti 2.8, the situation is as follows:
``Master daemon (MasterD)``
It is responsible for managing the entire cluster, and it's written in Python.
It is executed on a single node (the master node). It receives the commands
given by the cluster administrator (through the remote API daemon or the
command line tools) over the LUXI protocol. The master daemon is responsible
for creating and managing the jobs that will execute such commands, and for
managing the locks that ensure the cluster will not incur in race conditions.
Each job is managed by a separate Python thread, that interacts with the node
daemons via RPC calls.
The master daemon is also responsible for managing the configuration of the
cluster, changing it when required by some job. It is also responsible for
copying the configuration to the other master candidates after updating it.
``RAPI daemon (RapiD)``
It is written in Python and runs on the master node only. It waits for
requests issued remotely through the remote API protocol. Then, it forwards
them, using the LUXI protocol, to the master daemon (if they are commands) or
to the query daemon if they are queries about the configuration (including
live status) of the cluster.
``Node daemon (NodeD)``
It is written in Python. It runs on all the nodes. It is responsible for
receiving the master requests over RPC and execute them, using the appropriate
backend (hypervisors, DRBD, LVM, etc.). It also receives requests over RPC for
the execution of queries gathering live data on behalf of the query daemon.
``Configuration daemon (ConfD)``
It is written in Haskell. It runs on all the master candidates. Since the
configuration is replicated only on the master node, this daemon exists in
order to provide information about the configuration to nodes needing them.
The requests are done through ConfD's own protocol, HMAC signed,
implemented over UDP, and meant to be used by parallely querying all the
master candidates (or a subset thereof) and getting the most up to date
answer. This is meant as a way to provide a robust service even in case master
is temporarily unavailable.
``Query daemon (QueryD)``
It is written in Haskell. It runs on all the master candidates. It replies
to Luxi queries about the current status of the system, including live data it
obtains by querying the node daemons through RPCs.
``Monitoring daemon (MonD)``
It is written in Haskell. It runs on all nodes, including the ones that are
not vm-capable. It is meant to provide information on the status of the
system. Such information is related only to the specific node the daemon is
running on, and it is provided as JSON encoded data over HTTP, to be easily
readable by external tools.
The monitoring daemon communicates with ConfD to get information about the
configuration of the cluster. The choice of communicating with ConfD instead
of MasterD allows it to obtain configuration information even when the cluster
is heavily degraded (e.g.: when master and some, but not all, of the master
candidates are unreachable).
The current structure of the Ganeti daemons is inefficient because there are
many different protocols involved, and each daemon needs to be able to use
multiple ones, and has to deal with doing different things, thus making
sometimes unclear which daemon is responsible for performing a specific task.
Also, with the current configuration, jobs are managed by the master daemon
using python threads. This makes terminating a job after it has started a
difficult operation, and it is the main reason why this is not possible yet.
The master daemon currently has too many different tasks, that could be handled
better if split among different daemons.
Proposed changes
In order to improve on the current situation, a new daemon subdivision is
proposed, and presented hereafter.
.. digraph:: "new-daemons-structure"
{rank=same; RConfD LuxiD;}
{rank=same; Jobs rconfigdata;}
node [shape=box]
RapiD [label="RapiD [M]"]
LuxiD [label="LuxiD [M]"]
WConfD [label="WConfD [M]"]
Jobs [label="Jobs [M]"]
RConfD [label="RConfD [MC]"]
MonD [label="MonD [All]"]
NodeD [label="NodeD [All]"]
Clients [label="gnt-*\nclients [M]"]
p1 [shape=none, label=""]
p2 [shape=none, label=""]
p3 [shape=none, label=""]
p4 [shape=none, label=""]
configdata [shape=none, label=""]
rconfigdata [shape=none, label="\n[MC copy]"]
locksdata [shape=none, label=""]
RapiD -> LuxiD [label="LUXI"]
LuxiD -> WConfD [label="WConfD\nproto"]
LuxiD -> Jobs [label="fork/exec"]
Jobs -> WConfD [label="WConfD\nproto"]
Jobs -> NodeD [label="RPC"]
LuxiD -> NodeD [label="RPC"]
rconfigdata -> RConfD
configdata -> rconfigdata [label="sync via\nNodeD RPC"]
WConfD -> NodeD [label="RPC"]
WConfD -> configdata
WConfD -> locksdata
MonD -> RConfD [label="RConfD\nproto"]
Clients -> LuxiD [label="LUXI"]
p1 -> MonD [label="MonD proto"]
p2 -> RapiD [label="RAPI"]
p3 -> RConfD [label="RConfD\nproto"]
p4 -> Clients [label="CLI"]
``LUXI daemon (LuxiD)``
It will be written in Haskell. It will run on the master node and it will be
the only LUXI server, replying to all the LUXI queries. These includes both
the queries about the live configuration of the cluster, previously served by
QueryD, and the commands actually changing the status of the cluster by
submitting jobs. Therefore, this daemon will also be the one responsible with
managing the job queue. When a job needs to be executed, the LuxiD will spawn
a separate process tasked with the execution of that specific job, thus making
it easier to terminate the job itself, if needeed. When a job requires locks,
LuxiD will request them from WConfD.
In order to keep availability of the cluster in case of failure of the master
node, LuxiD will replicate the job queue to the other master candidates, by
RPCs to the NodeD running there (the choice of RPCs for this task might be
reviewed at a second time, after implementing this design).
``Configuration management daemon (WConfD)``
It will run on the master node and it will be responsible for the management
of the authoritative copy of the cluster configuration (that is, it will be
the daemon actually modifying the ```` file). All the requests of
configuration changes will have to pass through this daemon, and will be
performed using a LUXI-like protocol ("WConfD proto" in the graph. The exact
protocol will be defined in the separate design document that will detail the
WConfD separation). Having a single point of configuration management will
also allow Ganeti to get rid of possible race conditions due to concurrent
modifications of the configuration. When the configuration is updated, it
will have to push the received changes to the other master candidates, via
RPCs, so that RConfD daemons and (in case of a failure on the master node)
the WConfD daemon on the new master can access an up-to-date version of it
(the choice of RPCs for this task might be reviewed at a second time). This
daemon will also be the one responsible for managing the locks, granting them
to the jobs requesting them, and taking care of freeing them up if the jobs
holding them crash or are terminated before releasing them. In order to do
this, each job, after being spawned by LuxiD, will open a local unix socket
that will be used to communicate with it, and will be destroyed when the job
terminates. LuxiD will be able to check, after a timeout, whether the job is
still running by connecting here, and to ask WConfD to forcefully remove the
locks if the socket is closed.
Also, WConfD should hold a serialized list of the locks and their owners in a
file (````), so that it can keep track of their status in case it
crashes and needs to be restarted (by asking LuxiD which of them are still
Interaction with this daemon will be performed using Unix sockets.
``Configuration query daemon (RConfD)``
It is written in Haskell, and it corresponds to the old ConfD. It will run on
all the master candidates and it will serve information about the the static
configuration of the cluster (the one contained in ````). The
provided information will be highly available (as in: a response will be
available as long as a stable-enough connection between the client and at
least one working master candidate is available) and its freshness will be
best effort (the most recent reply from any of the master candidates will be
returned, but it might still be older than the one available through WConfD).
The information will be served through the ConfD protocol.
``Rapi daemon (RapiD)``
It remains basically unchanged, with the only difference that all of its LUXI
query are directed towards LuxiD instead of being split between MasterD and
``Monitoring daemon (MonD)``
It remains unaffected by the changes in this design document. It will just get
some of the data it needs from RConfD instead of the old ConfD, but the
interfaces of the two are identical.
``Node daemon (NodeD)``
It remains unaffected by the changes proposed in the design document. The only
difference being that it will receive its RPCs from LuxiD (for job queue
replication), from WConfD (for configuration replication) and for the
processes executing single jobs (for all the operations to be performed by
nodes) instead of receiving them just from MasterD.
This restructuring will allow us to reorganize and improve the codebase,
introducing cleaner interfaces and giving well defined and more restricted tasks
to each daemon.
Furthermore, having more well-defined interfaces will allow us to have easier
upgrade procedures, and to work towards the possibility of upgrading single
components of a cluster one at a time, without the need for immediately
upgrading the entire cluster in a single step.
While performing this refactoring, we aim to increase the amount of
Haskell code, thus benefiting from the additional type safety provided by its
wide compile-time checks. In particular, all the job queue management and the
configuration management daemon will be written in Haskell, taking over the role
currently fulfilled by Python code executed as part of MasterD.
The changes describe by this design document are quite extensive, therefore they
will not be implemented all at the same time, but through a sequence of steps,
leaving the codebase in a consistent and usable state.
#. Rename QueryD to LuxiD.
A part of LuxiD, the one replying to configuration
queries including live information about the system, already exists in the
form of QueryD. This is being renamed to LuxiD, and will form the first part
of the new daemon. NB: this is happening starting from Ganeti 2.8. At the
beginning, only the already existing queries will be replied to by LuxiD.
More queries will be implemented in the next versions.
#. Let LuxiD be the interface for the queries and MasterD be their executor.
Currently, MasterD is the only responsible for receiving and executing LUXI
queries, and for managing the jobs they create.
Receiving the queries and managing the job queue will be extracted from
MasterD into LuxiD.
Actually executing jobs will still be done by MasterD, that contains all the
logic for doing that and for properly managing locks and the configuration.
A separate design document will detail how the system will decide which jobs
to send over for execution, and how to rate-limit them.
#. Extract WConfD from MasterD.
The logic for managing the configuration file is factored out to the
dedicated WConfD daemon. All configuration changes, currently executed
directly by MasterD, will be changed to be IPC requests sent to the new
#. Extract locking management from MasterD.
The logic for managing and granting locks is extracted to WConfD as well.
Locks will not be taken directly anymore, but asked via IPC to WConfD.
This step can be executed on its own or at the same time as the previous one.
#. Jobs are executed as processes.
The logic for running jobs is rewritten so that each job can be managed by an
independent process. LuxiD will spawn a new (Python) process for every single
job. The RPCs will remain unchanged, and the LU code will stay as is as much
as possible.
MasterD will cease to exist as a deamon on its own at this point, but not
Further considerations
There is a possibility that a job will finish performing its task while LuxiD
and/or WConfD will not be available.
In order to deal with this situation, each job will write the results of its
execution on a file. The name of this file will be known to LuxiD before
starting the job, and will be stored together with the job ID, and the
name of the job-unique socket.
The job, upon ending its execution, will signal LuxiD (through the socket), so
that it can read the result of the execution and release the locks as needed.
In case LuxiD is not available at that time, the job will just terminate without
signalling it, and writing the results on file as usual. When a new LuxiD
becomes available, it will have the most up-to-date list of running jobs
(received via replication from the former LuxiD), and go through it, cleaning up
all the terminated jobs.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
......@@ -22,6 +22,7 @@ Design document drafts
.. vim: set textwidth=72 :
.. Local Variables:
......@@ -27,6 +27,20 @@ print its output to stdout, in JSON format.
| diskstats [ [ **-f** | **\--file** ] = *input-file* ]
Collects the information about the status of the disks of the system, as listed
by /proc/diskstats, or by an alternate file with the same syntax specified on
the command line.
The options that can be passed to the DRBD collector are as follows:
-f *input-file*, \--file=*input-file*
Where to read the data from. Default if not specified: /proc/diskstats
......@@ -71,8 +85,8 @@ one:
| lv [ [ **-a** | **\--address** ] = *ip-address* ] [ [ **-p** | **\--port** ]
= *port-number* ] [ [ **-f** | **\--file** ] = *input-file* ]
| lv [ [ **-a** | **\--address** ] = *ip-address* ] [ [ **-p** | **\--port** ]
= *port-number* ] [ [ **-f** | **\--file** ] = *input-file* ]
[ [ **-i** | **\--instances** ] = *instances-file* ]
Collects the information about the logical volumes of the current node.
......@@ -95,10 +109,10 @@ serialized on files (mainly for testing purposes). Namely:
The name of the file containing a recorded output of the ``lvs`` tool.
-i *instances-file*, \--instances=*instances-file*
The name of the file containing a JSON serialization of instances the
The name of the file containing a JSON serialization of instances the
current node is primary and secondary for, listed as::
([Instance], [Instance])
where the first list contains the instances the node is primary for, the
where the first list contains the instances the node is primary for, the
second list those the node is secondary for.
......@@ -43,6 +43,7 @@ import qualified Data.Map as Map
import Network.BSD (getHostName)
import qualified Text.JSON as J
import Ganeti.BasicTypes as BT
import Ganeti.Confd.ClientFunctions
import Ganeti.Common
import Ganeti.DataCollectors.CLI
......@@ -181,12 +182,19 @@ buildInstStatusReport srvAddr srvPort = do
node <- getHostName
answer <- getInstances node srvAddr srvPort
inst <- exitIfBad "Can't get instance info from ConfD" answer
domains <- getInferredDomInfo
uptimes <- getUptimeInfo
let primaryInst = fst inst
iStatus <- mapM (buildStatus domains uptimes) primaryInst
let globalStatus = computeGlobalStatus iStatus
jsonReport = J.showJSON $ ReportData iStatus globalStatus
d <- getInferredDomInfo
reportData <-
case d of
BT.Ok domains -> do
uptimes <- getUptimeInfo
let primaryInst = fst inst
iStatus <- mapM (buildStatus domains uptimes) primaryInst
let globalStatus = computeGlobalStatus iStatus
return $ ReportData iStatus globalStatus
BT.Bad m ->
return . ReportData [] . DCStatus DCSCBad $
"Unable to receive the list of instances: " ++ m
let jsonReport = J.showJSON reportData
buildReport dcName dcVersion dcFormatVersion dcCategory dcKind jsonReport
-- | Main function.
......@@ -40,21 +40,25 @@ import qualified Ganeti.BasicTypes as BT
import qualified Ganeti.Constants as C
import Ganeti.Hypervisor.Xen.Types
import Ganeti.Hypervisor.Xen.XmParser
import Ganeti.Logging
import Ganeti.Utils
-- | Get information about the current Xen domains as a map where the domain
-- name is the key. This only includes the information made available by Xen
-- itself.
getDomainsInfo :: IO (Map.Map String Domain)
getDomainsInfo :: IO (BT.Result (Map.Map String Domain))
getDomainsInfo = do
contents <-
((E.try $ readProcess C.xenCmdXm ["list", "--long"] "")
:: IO (Either IOError String)) >>=
exitIfBad "running command" . either (BT.Bad . show) BT.Ok
case A.parseOnly xmListParser $ pack contents of
Left msg -> exitErr msg
Right dom -> return dom
(E.try $ readProcess C.xenCmdXm ["list", "--long"] "")
:: IO (Either IOError String)
return $
either (BT.Bad . show) (
\c ->
case A.parseOnly xmListParser $ pack c of
Left msg -> BT.Bad msg
Right dom -> BT.Ok dom
) contents
-- | Given a domain and a map containing information about multiple domains,
-- infer additional information about that domain (specifically, whether it is
......@@ -70,11 +74,19 @@ inferDomInfos domMap dom1 =
-- name is the key. This includes information made available by Xen itself as
-- well as further information that can be inferred by querying Xen multiple
-- times and comparing the results.
getInferredDomInfo :: IO (Map.Map String Domain)
getInferredDomInfo :: IO (BT.Result (Map.Map String Domain))
getInferredDomInfo = do
domMap1 <- getDomainsInfo
domMap2 <- getDomainsInfo
return $ fmap (inferDomInfos domMap2) domMap1
case (domMap1, domMap2) of
(BT.Bad m1, BT.Bad m2) -> return . BT.Bad $ m1 ++ "\n" ++ m2
(BT.Bad m, BT.Ok d) -> do
logWarning $ "Unable to retrieve domains info the first time" ++ m
return $ BT.Ok d
(BT.Ok d, BT.Bad m) -> do
logWarning $ "Unable to retrieve domains info the second time" ++ m
return $ BT.Ok d
(BT.Ok d1, BT.Ok d2) -> return . BT.Ok $ fmap (inferDomInfos d2) d1
-- | Get information about the uptime of domains, as a map where the domain ID
-- is the key.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment