Commit 1e89d8ec authored by Klaus Aehlig's avatar Klaus Aehlig

Add a design for redundancy with plain instances

While we cannot avoid data loss on node crashs if we
have plain instances, we can ensure that the cluster
has enough capacity to reinstall the instances on a
new node. Add a design describing how we enusre this.
Signed-off-by: default avatarKlaus Aehlig <aehlig@google.com>
Reviewed-by: default avatarPetr Pudlak <pudlak@google.com>
parent 62ba2be9
......@@ -704,6 +704,7 @@ docinput = \
doc/design-os.rst \
doc/design-ovf-support.rst \
doc/design-partitioned.rst \
doc/design-plain-redundancy.rst \
doc/design-performance-tests.rst \
doc/design-query-splitting.rst \
doc/design-query2.rst \
......
......@@ -27,6 +27,7 @@ Design document drafts
design-multi-storage-htools.rst
design-shared-storage-redundancy.rst
design-repaird.rst
design-plain-redundancy.rst
.. vim: set textwidth=72 :
.. Local Variables:
......
======================================
Redundancy for the plain disk template
======================================
.. contents:: :depth: 4
This document describes how N+1 redundancy is achieved
for instanes using the plain disk template.
Current state and shortcomings
==============================
Ganeti has long considered N+1 redundancy for DRBD, making sure that
on the secondary nodes enough memory is reserved to host the instances,
should one node fail. Recently, ``htools`` have been extended to
also take :doc:`design-shared-storage-redundancy` into account.
For plain instances, there is no direct notion of redundancy: if the
node the instance is running on dies, the instance is lost. However,
if the instance can be reinstalled (e.g, because it is providing a
stateless service), it does make sense to ask if the remaining nodes
have enough free capacity for the instances to be recreated. This
form of capacity planning is currently not addressed by current
Ganeti.
Proposed changes
================
The basic considerations follow those of :doc:`design-shared-storage-redundancy`.
Also, the changes to the tools follow the same pattern.
Definition of N+1 redundancy in the presence of shared and plain storage
------------------------------------------------------------------------
A cluster is considered N+1 redundant, if, for every node, the following
steps can be carried out. First all DRBD instances are migrated out. Then,
all shared-storage instances of that node are relocated to another node in
the same node group. Finally, all plain instances of that node are reinstalled
on a different node in the same node group; in the search for a new nodes for
the plain instances, they will be recreated in order of decreasing memory
size.
Note that the first two setps are those in the definition of N+1 redundancy
for shared storage. In particular, this notion of redundancy strictly extends
the one for shared storage. Again, checking this notion of redundancy is
computationally expensive and the non-DRBD part is mainly a capacity property
in the sense that we expect the majority of instance moves that are fine
from a DRBD point of view will not lead from a redundant to a non-redundant
situation.
Modifications to existing tools
-------------------------------
The changes to the exisiting tools are literally the same as
for :doc:`design-shared-storage-redundancy` with the above definition of
N+1 redundancy substituted in for that of redundancy for shared storage.
In particular, ``gnt-cluster verify`` will not be changed and ``hbal``
will use N+1 redundancy as a final filter step to disallow moves
that lead from a redundant to a non-redundant situation.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment