Commit b274708c authored by Klaus Aehlig's avatar Klaus Aehlig

Add a design document on improving allocation

This design describes an addition to the cluster
metrics that improves allocation efficiency. So
far, only the balancedness of the amount of reserved
memory was taken into account, but not its total
amount. The reserved memory, however, is not preserved
by instance moves; it can be lowered by spreading the
secondaries more widely over the cluster.
Signed-off-by: default avatarKlaus Aehlig <aehlig@google.com>
Reviewed-by: default avatarPetr Pudlak <pudlak@google.com>
parent 1197e08e
......@@ -653,6 +653,7 @@ docinput = \
doc/design-2.12.rst \
doc/design-2.13.rst \
doc/design-2.14.rst \
doc/design-allocation-efficiency.rst \
doc/design-autorepair.rst \
doc/design-bulk-create.rst \
doc/design-ceph-ganeti-support.rst \
......
=========================================================================
Improving allocation efficiency by considering the total reserved memory
=========================================================================
This document describes a change to the cluster metric to enhance
the allocation efficiency of Ganeti's ``htools``.
.. contents:: :depth: 4
Current state and shortcomings
==============================
Ganeti's ``htools``, which typically make all allocation and balancing
decisions, greedily try to improve the cluster metric. So it is important
that the cluster metric faithfully reflects the objectives of these operations.
Currently the cluster metric is composed of counting violations (instances on
offline nodes, nodes that are not N+1 redundant, etc) and the sum of standard
deviations of relative resource usage of the individual nodes. The latter
component is to ensure that all nodes equally bear the load of the instances.
This is reasonable for resources where the total usage is independent of
its distribution, as it is the case for CPU, disk, and total RAM. It is,
however, not true for reserved memory. By distributing its secondaries
more widespread over the cluster, a node can reduce its reserved memory
without increasing it on other nodes. Not taking this aspect into account
has lead to quite inefficient allocation of instances on the cluster (see
example below).
Proposed changes
================
A new additive component is added to the cluster metric. It is the sum over
all nodes of the fraction of reserved memory. This way, moves and allocations
that reduce the amount of memory reserved to ensure N+1 redundancy are favored.
Note that this component does not have the scaling of standard deviations of
fractions, but, instead counts nodes reserved for N+1 redundancy. In an ideal
allocation, this will not exceed 1. But bad allocations will violate this
property. As waste of reserved memory is a more future-oriented problem than,
e.g., current N+1 violations, we give the new component a relatively small
weight of 0.25, so that counting current violations still dominate.
Another consequence of this metric change is that the value 0 is no longer
obtainable: as soon as we have DRBD instance, we have to reserve memory.
However, in most cases only differences of scores influence decissions made.
In the few cases, were absolute values of the cluster score are specified,
they are interpreted as relative to the theoretical minimum of the reserved
memory score.
Example
=======
Consider the capacity of an empty cluster of 6 nodes, each capable of holding
10 instances; this can be measured, e.g., by
``hspace --simulate=p,6,204801,10241,21 --disk-template=drbd
--standard-alloc=10240,1024,2``. Without the metric change 34 standard
instances are allocated. With the metric change, 48 standard instances
are allocated. This is a 41% increase in utilization.
......@@ -26,6 +26,7 @@ Design document drafts
design-configlock.rst
design-multi-storage-htools.rst
design-dedicated-allocation.rst
design-allocation-efficiency.rst
.. vim: set textwidth=72 :
.. Local Variables:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment