Commit 3648dad3 authored by Klaus Aehlig's avatar Klaus Aehlig
Browse files

Merge branch 'stable-2.11' into stable-2.12

* stable-2.11
  Add hsqueeze to NEWS
  kvm: use a dedicated QMP socket for kvmd
  Add design-node-security.rst to docinput

* stable-2.10
  Test parallel instance ops and plain instances
  Test parallel creation of DRBD instances
  Test parallel job submission performance
  Test parallel instance query operations
  Test parallel instance operations
  Test parallel instance modification
  Test parallel node-count instance creation
  Test parallel instance creation and removal
  Fail in replace-disks if attaching disks fails
  Add a basic test for --restricted-migration
  Describe the --restricted-migration option
  Support restricted migration
  Add an option for restricted migration
  Add an example for node evacuation
  Add a test for parsing version strings
  Set correct Ganeti version on setup commands
  Add a utility to combine shell commands
  Add design doc for performance tests

* stable-2.9
  KVM: set IFF_ONE_QUEUE on created tap interfaces
  Add configure option to pass GHC flags

	trivial, following code moves
Signed-off-by: default avatarKlaus Aehlig <>
Reviewed-by: default avatarPetr Pudlak <>
parents 00828204 866e1f76
......@@ -598,6 +598,7 @@ docinput = \
doc/design-multi-version-tests.rst \
doc/design-network.rst \
doc/design-node-add.rst \
doc/design-node-security.rst \
doc/design-oob.rst \
doc/design-openvswitch.rst \
doc/design-opportunistic-locking.rst \
......@@ -605,6 +606,8 @@ docinput = \
doc/design-os.rst \
doc/design-ovf-support.rst \
doc/design-partitioned.rst \
doc/design-performance-tests.rst \
doc/design-query-splitting.rst \
doc/design-query2.rst \
doc/design-query-splitting.rst \
doc/design-reason-trail.rst \
......@@ -713,6 +716,8 @@ HTEST_FLAGS = $(HFLAGS) -fhpc -itest/hs \
# internal extra flags (used for test/hs/htest mainly)
# combination of HEXTRA and HEXTRA_CONFIGURE
# exclude options for coverage reports
HPCEXCL = --exclude Main \
--exclude Ganeti.Constants \
......@@ -1126,26 +1131,26 @@ endif
Makefile.ghc: $(HS_MAKEFILE_GHC_SRCS) Makefile \
| $(built_base_sources) $(HS_BUILT_SRCS)
$(GHC) -M -dep-makefile $@ -dep-suffix $(HTEST_SUFFIX) $(HFLAGS) -itest/hs \
@echo '[GHC]: $@ <- $^'
@$(GHC) -c $(HFLAGS) \
$(HS_PARALLEL3) $(HS_REGEX_PCRE) $(HEXTRA) $(@:%.o=%.hs)
@echo '[GHC]: $@ <- $^'
@$(GHC) -c $(HTEST_FLAGS) \
%.hi: %.o ;
$(HS_SRC_PROGS): %: %.o | stamp-directories
$(GHC) $(HFLAGS) \
$(HS_PARALLEL3) $(HS_REGEX_PCRE) $(HEXTRA) --make $(@:%=%.hs)
$(HS_PARALLEL3) $(HS_REGEX_PCRE) $(HEXTRA_COMBINED) --make $(@:%=%.hs)
@rm -f $(notdir $@).tix
@touch "$@"
......@@ -1157,7 +1162,7 @@ $(HS_TEST_PROGS): %: %.$(HTEST_SUFFIX)_o \
exit 1; \
$(HS_PARALLEL3) $(HS_REGEX_PCRE) $(HEXTRA) --make $(@:%=%.hs)
$(HS_PARALLEL3) $(HS_REGEX_PCRE) $(HEXTRA_COMBINED) --make $(@:%=%.hs)
@rm -f $(notdir $@).tix
@touch "$@"
......@@ -1379,6 +1384,7 @@ TEST_FILES = \
test/data/htools/hail-reloc-drbd.json \
test/data/htools/ \
test/data/htools/ \
test/data/htools/ \
test/data/htools/ \
test/data/htools/ \
test/data/htools/ \
......@@ -1411,6 +1417,7 @@ TEST_FILES = \
test/hs/shelltests/htools-dynutil.test \
test/hs/shelltests/htools-excl.test \
test/hs/shelltests/htools-hail.test \
test/hs/shelltests/htools-hbal-evac.test \
test/hs/shelltests/htools-hroller.test \
test/hs/shelltests/htools-hspace.test \
test/hs/shelltests/htools-hsqueeze.test \
......@@ -83,6 +83,9 @@ New features
``--gluster-storage-dir`` switch.
- Job scheduling is now handled by luxid, and the maximal number of jobs running
in parallel is a run-time parameter of the cluster.
- A new tool for planning dynamic power management, called ``hsqueeze``, has
been added. It suggests nodes to power up or down and corresponding instance
New dependencies
......@@ -108,6 +108,15 @@ AC_ARG_ENABLE([haskell-tests],
AM_CONDITIONAL([HTEST], [test "$HTEST" = yes])
# --with-haskell-flags=
[Extra flags to pass to GHC]
AC_SUBST(HEXTRA_CONFIGURE, $hextra_configure)
# --with-ssh-initscript=...
......@@ -23,6 +23,7 @@ Design document drafts
.. vim: set textwidth=72 :
.. Local Variables:
Performance tests for QA
.. contents:: :depth: 4
This design document describes performance tests to be added to QA in
order to measure performance changes over time.
Current state and shortcomings
Currently, only functional QA tests are performed. Those tests verify
the correct behaviour of Ganeti in various configurations, but are not
designed to continuously monitor the performance of Ganeti.
The current QA tests don't execute multiple tasks/jobs in parallel.
Therefore, the locking part of Ganeti does not really receive any
testing, neither functional nor performance wise.
On the plus side, Ganeti's QA code does already measure the runtime of
individual tests, which is leveraged in this design.
Proposed changes
The tests to be added in the context of this design document focus on
two areas:
* Job queue performance. How does Ganeti handle a lot of submitted
* Parallel job execution performance. How well does Ganeti
parallelize jobs?
Jobs are submitted to the job queue in sequential order, but the
execution of the jobs runs in parallel. All job submissions must
complete within a reasonable timeout.
In order to make it easier to recognize performance related tests, all
tests added in the context of this design get a description with a
"PERFORMANCE: " prefix.
Job queue performance
Tests targeting the job queue should eliminate external factors (like
network/disk performance or hypervisor delays) as much as possible, so
they are designed to run in a vcluster QA environment.
The following tests are added to the QA:
* Submit the maximum amount of instance create jobs in parallel. As
soon as a creation job succeeds, submit a removal job for this
* Submit as many instance create jobs as there are nodes in the
cluster in parallel (for non-redundant instances). Removal jobs
as above.
* For the maximum amount of instances in the cluster, submit modify
jobs (modify hypervisor and backend parameters) in parallel.
* For the maximum amount of instances in the cluster, submit stop,
start, reboot and reinstall jobs in parallel.
* For the maximum amount of instances in the cluster, submit multiple
list and info jobs in parallel.
* For the maximum amount of instances in the cluster, submit move
jobs in parallel. While the move operations are running, get
instance information using info jobs. Those jobs are required to
return within a reasonable low timeout.
* For the maximum amount of instances in the cluster, submit add-,
remove- and list-tags jobs.
* Submit 200 `gnt-debug delay` jobs with a delay of 0.1 seconds. To
speed up submission, perform multiple job submissions in parallel.
Verify that submitting jobs doesn't significantly slow down during
the process. Verify that querying cluster information over CLI and
RAPI succeeds in a timely fashion with the delay jobs
Parallel job execution performance
Tests targeting the performance of parallel execution of "real" jobs
in close-to-production clusters should actually perform all operations,
such as creating disks and starting instances. This way, real world
locking or waiting issues can be reproduced. Performing all those
operations does requires quite some time though, so only a smaller
number of instances and parallel jobs can be tested realistically.
The following tests are added to the QA:
* Submitting twice as many instance creation request as there are
nodes in the cluster, using DRBD as disk template. As soon as a
creation job succeeds, submit a removal job for this instance.
* Submitting twice as many instance creation request as there are
nodes in the cluster, using Plain as disk template. As soon as a
creation job succeeds, submit a removal job for this instance.
This test can make better use of parallelism because only one
node must be locked for an instance creation.
* Create an instance using DRBD. Fail it over, migrate it, change
its secondary node, reboot it and reinstall it while creating an
additional instance in parallel to each of those operations.
Future work
Based on test results of the tests listed above, additional tests can
be added to cover more real-world use-cases. Also, based on user
requests, specially crafted performance tests modeling those workloads
can be added too.
Additionally, the correlations between job submission time and job
queue size could be detected. Therefore, a snapshot of the job queue
before job submission could be taken to measure job submission time
based on the jobs in the queue.
.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End:
......@@ -308,6 +308,30 @@ def RunNodeSetupCmd(cluster_name, node, basecmd, debug, verbose,
if verbose:
logging.debug("Node setup command: %s", cmd)
version = constants.DIR_VERSION
all_cmds = [["test", "-d", os.path.join(pathutils.PKGLIBDIR, version)]]
if constants.HAS_GNU_LN:
all_cmds.extend([["ln", "-s", "-f", "-T",
os.path.join(pathutils.PKGLIBDIR, version),
os.path.join(pathutils.SYSCONFDIR, "ganeti/lib")],
["ln", "-s", "-f", "-T",
os.path.join(pathutils.SHAREDIR, version),
os.path.join(pathutils.SYSCONFDIR, "ganeti/share")]])
all_cmds.extend([["rm", "-f",
os.path.join(pathutils.SYSCONFDIR, "ganeti/lib")],
["ln", "-s", "-f",
os.path.join(pathutils.PKGLIBDIR, version),
os.path.join(pathutils.SYSCONFDIR, "ganeti/lib")],
["rm", "-f",
os.path.join(pathutils.SYSCONFDIR, "ganeti/share")],
["ln", "-s", "-f",
os.path.join(pathutils.SHAREDIR, version),
os.path.join(pathutils.SYSCONFDIR, "ganeti/share")]])
if port is None:
port = netutils.GetDaemonPort(constants.SSH)
......@@ -315,7 +339,8 @@ def RunNodeSetupCmd(cluster_name, node, basecmd, debug, verbose,
srun = ssh.SshRunner(cluster_name,
ipv6=(family ==
scmd = srun.BuildCmd(node, constants.SSH_LOGIN_USER,
batch=False, ask_key=ask_key, quiet=False,
......@@ -2686,10 +2686,10 @@ class TLReplaceDisks(Tasklet):
for to_node, to_result in result.items():
msg = to_result.fail_msg
if msg:"Can't attach drbd disks on node %s: %s",
self.cfg.GetNodeName(to_node), msg,
hint=("please do a gnt-instance info to see the"
" status of disks"))
raise errors.OpExecError(
"Can't attach drbd disks on node %s: %s (please do a gnt-instance "
"info to see the status of disks)" %
(self.cfg.GetNodeName(to_node), msg))
cstep = itertools.count(5)
......@@ -552,6 +552,13 @@ class KVMHypervisor(hv_base.BaseHypervisor):
return utils.PathJoin(cls._CTRL_DIR, "%s.qmp" % instance_name)
def _InstanceKvmdMonitor(cls, instance_name):
"""Returns the instance kvm daemon socket name
return utils.PathJoin(cls._CTRL_DIR, "%s.kvmd" % instance_name)
def _InstanceShutdownMonitor(cls, instance_name):
"""Returns the instance QMP output filename
......@@ -1593,6 +1600,9 @@ class KVMHypervisor(hv_base.BaseHypervisor):
logging.debug("Enabling QMP")
kvm_cmd.extend(["-qmp", "unix:%s,server,nowait" %
# Add a second monitor for kvmd
kvm_cmd.extend(["-qmp", "unix:%s,server,nowait" %
# Configure the network now for starting instances and bridged interfaces,
# during FinalizeMigration for incoming instances' routed interfaces
......@@ -39,6 +39,7 @@ TUNGETIFF = 0x800454d2
IFF_TAP = 0x0002
IFF_NO_PI = 0x1000
IFF_ONE_QUEUE = 0x2000
IFF_VNET_HDR = 0x4000
......@@ -112,7 +113,7 @@ def OpenTap(vnet_hdr=True, name=""):
except EnvironmentError:
raise errors.HypervisorError("Failed to open /dev/net/tun")
flags = IFF_TAP | IFF_NO_PI
if vnet_hdr and _ProbeTapVnetHdr(tapfd):
flags |= IFF_VNET_HDR
......@@ -242,6 +242,13 @@ def ShellQuoteArgs(args):
return " ".join([ShellQuote(i) for i in args])
def ShellCombineCommands(cmdlist):
"""Out of a list of shell comands construct a single one.
return ["/bin/sh", "-c", " && ".join(ShellQuoteArgs(c) for c in cmdlist)]
class ShellWriter:
"""Helper class to write scripts with indentation.
......@@ -33,6 +33,7 @@ Algorithm options:
**[ \--ignore-dynu ]**
**[ \--mond *yes|no* ]**
**[ \--evac-mode ]**
**[ \--restricted-migration ]**
**[ \--select-instances *inst...* ]**
**[ \--exclude-instances *inst...* ]**
......@@ -307,6 +308,15 @@ The options that can be passed to the program are as follows:
(bulk) replacement for Ganeti's own *gnt-node evacuate*, with the
note that it doesn't guarantee full evacuation.
This parameter disallows any replace-primary moves (frf), as well as
those replace-and-failover moves (rf) where the primary node of the
instance is not drained. If used together with the ``--evac-mode``
option, the only migrations that hbal will do are migrations of
instances off a drained node. This can be useful if during a reinstall
of the base operating system migration is only possible from the old
OS to the new OS.
This parameter marks the given instances (as a comma-separated list)
as the only ones being moved during the rebalance.
......@@ -44,6 +44,7 @@ import qa_monitoring
import qa_network
import qa_node
import qa_os
import qa_performance
import qa_job
import qa_rapi
import qa_tags
......@@ -851,6 +852,43 @@ def RunMonitoringTests():
def RunPerformanceTests():
if qa_config.TestEnabled("jobqueue-performance"):
instances = qa_performance.CreateAllInstances()
RunTest(qa_performance.TestParallelModify, instances)
RunTest(qa_performance.TestParallelInstanceOSOperations, instances)
RunTest(qa_performance.TestParallelInstanceQueries, instances)
if qa_config.TestEnabled("parallel-performance"):
if qa_config.IsTemplateSupported(constants.DT_DRBD8):
inodes = qa_config.AcquireManyNodes(2)
instance = qa_instance.TestInstanceAddWithDrbdDisk(inodes)
RunTest(qa_performance.TestParallelInstanceFailover, instance)
RunTest(qa_performance.TestParallelInstanceMigration, instance)
RunTest(qa_performance.TestParallelInstanceReplaceDisks, instance)
RunTest(qa_performance.TestParallelInstanceReboot, instance)
RunTest(qa_performance.TestParallelInstanceReinstall, instance)
RunTest(qa_performance.TestParallelInstanceRename, instance)
def RunQa():
"""Main QA body.
......@@ -990,6 +1028,8 @@ def RunQa():
RunTestIf("create-cluster", qa_node.TestNodeRemoveAll)
RunTestIf("cluster-destroy", qa_cluster.TestClusterDestroy)
......@@ -66,9 +66,12 @@ def ExecuteJobProducingCommand(cmd):
@param cmd: The command to execute, broken into constituent components.
job_id_output = _GetOutputFromMaster(cmd)
job_id_output = GetOutputFromMaster(cmd)
possible_job_ids = re.findall("JobID: ([0-9]+)", job_id_output)
# Usually, the output contains "JobID: <job_id>", but for instance related
# commands, the output is of the form "<job_id>: <instance_name>"
possible_job_ids = re.findall("JobID: ([0-9]+)", job_id_output) or \
re.findall("([0-9]+): .+", job_id_output)
if len(possible_job_ids) != 1:
raise qa_error.Error("Cannot parse command output to find job id: output "
"is %s" % job_id_output)
......@@ -150,7 +153,7 @@ def _GetNodeUUIDMap(nodes):
cmd = ["gnt-node", "list", "--no-header", "-o", "name,uuid"]
output = _GetOutputFromMaster(cmd)
output = GetOutputFromMaster(cmd)
return dict(map(lambda x: x.split(), output.splitlines()))
......@@ -200,7 +203,7 @@ def _GetBlockingLocks():
# Due to mysterious issues when a SSH multiplexer is being used by two
# threads, we turn it off, and block most of the logging to improve the
# visibility of the other thread's output
locks_output = _GetOutputFromMaster("gnt-debug locks", use_multiplexer=False,
locks_output = GetOutputFromMaster("gnt-debug locks", use_multiplexer=False,
# The first non-empty line is the header, which we do not need
This diff is collapsed.
......@@ -24,6 +24,7 @@
import copy
import datetime
import operator
import os
import random
......@@ -142,7 +143,7 @@ def _AssertRetCode(rcode, fail, cmdstr, nodename):
(cmdstr, nodename, rcode))
def AssertCommand(cmd, fail=False, node=None, log_cmd=True):
def AssertCommand(cmd, fail=False, node=None, log_cmd=True, max_seconds=None):
"""Checks that a remote command succeeds.
@param cmd: either a string (the command to execute) or a list (to
......@@ -154,6 +155,9 @@ def AssertCommand(cmd, fail=False, node=None, log_cmd=True):
dict or a string)
@param log_cmd: if False, the command won't be logged (simply passed to
@type max_seconds: double
@param max_seconds: fail if the command takes more than C{max_seconds}
@return: the return code of the command
@raise qa_error.Error: if the command fails when it shouldn't or vice versa
......@@ -168,9 +172,17 @@ def AssertCommand(cmd, fail=False, node=None, log_cmd=True):
cmdstr = utils.ShellQuoteArgs(cmd)
start =
rcode = StartSSH(nodename, cmdstr, log_cmd=log_cmd).wait()
duration_seconds = TimedeltaToTotalSeconds( - start)
_AssertRetCode(rcode, fail, cmdstr, nodename)
if max_seconds is not None:
if duration_seconds > max_seconds:
raise qa_error.Error(
"Cmd '%s' took %f seconds, maximum of %f was exceeded" %
(cmdstr, duration_seconds, max_seconds))
return rcode
......@@ -890,3 +902,19 @@ def UsesIPv6Connection(host, port):
return any(t[0] == socket.AF_INET6 for t in socket.getaddrinfo(host, port))
def TimedeltaToTotalSeconds(td):
"""Returns the total seconds in a C{datetime.timedelta} object.
This performs the same task as the C{datetime.timedelta.total_seconds()}
method which is present in Python 2.7 onwards.
@type td: datetime.timedelta
@param td: timedelta object to convert
@rtype float
@return: total seconds in the timedelta object
return ((td.microseconds + (td.seconds + td.days * 24.0 * 3600.0) * 10 ** 6) /
10 ** 6)
......@@ -52,6 +52,7 @@ module Ganeti.HTools.CLI
, oMonD
, oMonDDataFile
, oEvacMode
, oRestrictedMigrate
, oExInst
, oExTags
, oExecJobs
......@@ -132,6 +133,7 @@ data Options = Options
, optMonDFile :: Maybe FilePath -- ^ Optional file with data provided
-- ^ by MonDs
, optEvacMode :: Bool -- ^ Enable evacuation mode
, optRestrictedMigrate :: Bool -- ^ Disallow replace-primary moves
, optExInst :: [String] -- ^ Instances to be excluded
, optExTags :: Maybe [String] -- ^ Tags to use for exclusion
, optExecJobs :: Bool -- ^ Execute the commands via Luxi
......@@ -191,6 +193,7 @@ defaultOptions = Options
, optMonD = False
, optMonDFile = Nothing
, optEvacMode = False
, optRestrictedMigrate = False
, optExInst = []
, optExTags = Nothing
, optExecJobs = False
......@@ -368,6 +371,14 @@ oEvacMode =
\ instances away from offline and drained nodes",
oRestrictedMigrate :: OptType
oRestrictedMigrate =
(Option "" ["restricted-migration"]
(NoArg (\opts -> Ok opts { optRestrictedMigrate = True }))
"disallow replace-primary moves (aka frf-moves); in evacuation mode, this\
\ will ensure that the only migrations are off the drained nodes",
oExInst :: OptType
oExInst =
(Option "" ["exclude-instances"]
......@@ -551,27 +551,39 @@ checkSingleStep ini_tbl target cur_tbl move =
possibleMoves :: MirrorType -- ^ The mirroring type of the instance
-> Bool -- ^ Whether the secondary node is a valid new node
-> Bool -- ^ Whether we can change the primary node
-> (Bool, Bool) -- ^ Whether migration is restricted and whether
-- the instance primary is offline
-> Ndx -- ^ Target node candidate
-> [IMove] -- ^ List of valid result moves
possibleMoves MirrorNone _ _ _ = []
possibleMoves MirrorNone _ _ _ _ = []
possibleMoves MirrorExternal _ False _ = []
possibleMoves MirrorExternal _ False _ _ = []
possibleMoves MirrorExternal _ True tdx =
possibleMoves MirrorExternal _ True _ tdx =
[ FailoverToAny tdx ]
possibleMoves MirrorInternal _ False tdx =
possibleMoves MirrorInternal _ False _ tdx =
[ ReplaceSecondary tdx ]
possibleMoves MirrorInternal True True tdx =
possibleMoves MirrorInternal _ _ (True, False) tdx =
[ ReplaceSecondary tdx
possibleMoves MirrorInternal True True (False, _) tdx =
[ ReplaceSecondary tdx
, ReplaceAndFailover tdx
, ReplacePrimary tdx
, FailoverAndReplace tdx
possibleMoves MirrorInternal False True tdx =
possibleMoves MirrorInternal True True (True, True) tdx =
[ ReplaceSecondary tdx
, ReplaceAndFailover tdx
, FailoverAndReplace tdx
possibleMoves MirrorInternal False True _ tdx =
[ ReplaceSecondary tdx
, ReplaceAndFailover tdx
......@@ -580,10 +592,12 @@ possibleMoves MirrorInternal False True tdx =
checkInstanceMove :: [Ndx] -- ^ Allowed target node indices
-> Bool -- ^ Whether disk moves are allowed
-> Bool -- ^ Whether instance moves are allowed
-> Bool -- ^ Whether migration is restricted
-> Table -- ^ Original table
-> Instance.Instance -- ^ Instance to move
-> Table -- ^ Best new table for this instance
checkInstanceMove nodes_idx disk_moves inst_moves ini_tbl target =
checkInstanceMove nodes_idx disk_moves inst_moves rest_mig