Commit 6559d7f8 authored by Helga Velroyen's avatar Helga Velroyen
Browse files

Extension of storage reporting design doc



This patch rewrites and extends the design doc about storage reporting
with respect to disk templates and storage types. In constrast to the
previous version, we now consider disk templates as the user-facing
entity, that the user can dis/enable for the cluster. Storage types
on the other hand describe the underlying technology used by the various
disk templates. Storage reporting will use a mapping from disk templates
to storage types to pick the correct method to report the storage for
the respective disk templates.

Note that the design doc in this state still contains some questions and
FIXMEs. Feel free to comment on those. I will complete them directly or
in future patches.
Signed-off-by: default avatarHelga Velroyen <helgav@google.com>
Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
parent f7f03738
...@@ -7,66 +7,173 @@ Storage free space reporting ...@@ -7,66 +7,173 @@ Storage free space reporting
Background Background
========== ==========
Currently Space reporting is broken for all storage types except drbd or Currently, there is no consistent management of different variants of storage
lvm (plain). This design looks at the root causes and proposes a way to in Ganeti. One direct consequence is that storage space reporting is currently
fix it. broken for all storage that is not based on lvm technolgy. This design looks at
the root causes and proposes a way to fix it.
FIXME: rename the design doc to make clear that space reporting is not the only
thing covered here?
Proposed changes Proposed changes
================ ================
The changes below will streamline Ganeti to properly support We propose to streamline handling of different storage types and disk templates.
interaction with different storage types. Currently, there is no consistent implementation for dis/enabling of disk
templates and/or storage types.
Our idea is to introduce a list of enabled disk templates, which can be
used by instances in the cluster. Based on this list, we want to provide
storage reporting mechanisms for the available disk templates. Since some
disk templates share the same underlying storage technology (for example
``drbd`` and ``plain`` are based on ``lvm``), we map disk templates to storage
types and implement storage space reporting for each storage type.
Configuration changes Configuration changes
--------------------- ---------------------
Add a new attribute "enabled_storage_types" (type: list of strings) to the Add a new attribute "enabled_disk_templates" (type: list of strings) to the
cluster config which holds the types of storages, for example, "file", "rados", cluster config which holds disk templates, for example, "drbd", "file",
or "ext". We consider the first one of the list as the default type. or "ext". This attribute represents the list of disk templates that are enabled
cluster-wide for usage by the instances. It will not be possible to create
instances with a disk template that is not enabled, as well as it will not be
possible to remove a disk template from the list if there are still instances
using it.
The list of enabled disk templates can contain any non-empty subset of
the currently implemented disk templates: ``blockdev``, ``diskless``, ``drbd``,
``ext``, ``file``, ``plain``, ``rbd``, and ``sharedfile``. See
``DISK_TEMPLATES`` in ``constants.py``.
Note that the abovementioned list of enabled disk types is just a "mechanism"
parameter that defines which disk templates the cluster can use. Further
filtering about what's allowed can go in the ipolicy, which is not covered in
this design doc. Note that it is possible to force an instance to use a disk
template that is not allowed by the ipolicy. This is not possible if the
template is not enabled by the cluster.
FIXME: In what way should verification between the enabled disk templates in
the cluster and in the ipolicy take place?
We consider the first disk template in the list to be the default template for
instance creation and storage reporting. This will remove the need to specify
the disk template with ``-t`` on instance creation.
Currently, cluster-wide dis/enabling of disk templates is not implemented
consistently. ``lvm`` based disk templates are enabled by specifying a volume
group name on cluster initialization and can only be disabled by explicitly
using the option ``--no-lvm-storage``. This will be replaced by adding/removing
``drbd`` and ``plain`` from the set of enabled disk templates.
Up till now, file storage and shared file storage could be dis/enabled at
``./configure`` time. This will also be replaced by adding/removing the
respective disk templates from the set of enabled disk templates.
There is currently no possibility to dis/enable the disk templates
``diskless``, ``blockdev``, ``ext``, and ``rdb``. By introducing the set of
enabled disk templates, we will require these disk templates to be explicitely
enabled in order to be used. The idea is that the administrator of the cluster
can tailor the cluster configuration to what is actually needed in the cluster.
There is hope that this will lead to cleaner code, better performance and fewer
bugs.
When upgrading the configuration from a version that did not have the list
of enabled disk templates, we have to decide which disk templates are enabled
based on the current configuration of the cluster. We propose the following
update logic to be implemented in the online update of the config in
the ``Cluster`` class in ``objects.py``:
- If a ``volume_group_name`` is existing, then enable ``drbd`` and ``plain``.
(TODO: can we narrow that down further?)
- If ``file`` or ``sharedfile`` was enabled at configure time, add the
respective disk template to the list of enabled disk templates.
- For disk templates ``diskless``, ``blockdev``, ``ext``, and ``rbd``, we
inspect the current cluster configuration regarding whether or not there
are instances that use one of those disk templates. We will add only those
that are currently in use.
The order in which the list of enabled disk templates is built up will be
determined by a preference order based on when in the history of Ganeti the
disk templates were introduced (thus being a heuristic for which are used
more than others).
The list of enabled disk templates can be specified on cluster initialization
with ``gnt-cluster init`` using the optional parameter
``--enabled-disk-templates``. If it is not set, it will be set to a default
set of enabled disk templates, which includes the following disk templates:
``drbd`` and ``plain``. The list can be shrunk or extended by
``gnt-cluster modify`` using the same parameter.
Storage reporting
-----------------
For file storage, we'll report the storage space on the file storage dir, The storage reporting in ``gnt-node list`` will be the first user of the
which is currently limited to one directory. In the future, if we'll have newly introduced list of enabled disk templates. Currently, storage reporting
works only for lvm-based storage. We want to extend that and report storage
for the enabled disk templates. The default of ``gnt-node list`` will only
report on storage of the default disk template (the first in the list of enabled
disk templates). One can explicitly ask for storage reporting on the other
enabled disk templates with the ``-o`` option.
Some of the currently implemented disk templates share the same base storage
technology. Since the storage reporting is based on the underlying technology
rather than on the user-facing disk templates, we introduce storage types to
represent the underlying technology. There will be a mapping from disk templates
to storage types, which will be used by the storage reporting backend to pick
the right method for estimating the storage for the different disk templates.
The proposed storage types are ``blockdev``, ``diskless``, ``ext``, ``file``,
``lvm-pv``, ``lvm-vg``, ``rados``.
The mapping from disk templates to storage types will be: ``drbd`` and ``plain``
to ``lvm-vg``, ``file`` and ``sharedfile`` to ``file``, and all others to their
obvious counterparts.
Note that there is no disk template mapping to ``lvm-pv``, because this storage
type is currently only used to enable the user to mark it as (un)allocatable.
(See ``man gnt-node``.) It is not possible to create an instance on a storage
unit that is of type ``lvm-pv`` directly, therefore it is not included in the
mapping.
The storage reporting for file storage will report space on the file storage
dir, which is currently limited to one directory. In the future, if we'll have
support for more directories, or for per-nodegroup directories this can be support for more directories, or for per-nodegroup directories this can be
changed. changed.
Note that the abovementioned enabled_storage_types are just "mechanisms" For now, we will implement only the storage reporting for non-shared storage,
parameters that define which storage types the cluster can use. Further that is disk templates ``file``, ``lvm``, and ``drbd``. For disk template
filtering about what's allowed can go in the ipolicy, but these changes are ``diskless``, there is obviously nothing to report about. When implementing
not covered in this design doc. storage reporting for file, we can also use it for ``sharedfile``, since it
uses the same file system mechanisms to determine the free space. In the
Since the ipolicy currently has a list of enabled storage types, we'll future, we can optimize storage reporting for shared storage by not querying
use that to decide which storage type is the default, and to self-select all nodes that use a common shared file for the same space information.
it for new instance creations, and reporting.
In the future, we extend storage reporting for shared storage types like
Enabling/disabling of storage types at ``./configure`` time will be ``rados`` and ``ext``. Note that it will not make sense to query each node for
eventually removed. storage reporting on a storage unit that is used by several nodes.
We will not implement storage reporting for the ``blockdev`` disk template,
because block devices are always adopted after being provided by the system
administrator, thus coming from outside Ganeti. There is no point in storage
reporting for block devices, because Ganeti will never try to allocate storage
inside a block device.
RPC changes RPC changes
----------- -----------
The noded RPC call that reports node storage space will be changed to The noded RPC call that reports node storage space will be changed to
accept a list of <type>,<key> string tuples. For each of them, it will accept a list of <disktemplate>,<key> string tuples. For each of them, it will
report the free amount of storage space found on storage <key> as known report the free amount of storage space found on storage <key> as known
by the requested storage type types. For example types are ``lvm``, by the requested disk template. Depending on the disk template, the key would
``filesystem``, or ``rados``, and the key would be a volume group name, in be a volume group name, in case of lvm-based disk templates, a directory name
the case of lvm, a directory name for the filesystem and a rados pool name for the file and shared file storage, and a rados pool name for rados storage.
for rados storage.
For now, we will implement only the storage reporting for non-shared storage,
that is ``filesystem`` and ``lvm``. For shared storage types like ``rados``
and ``ext`` we will not implement a free space calculation, because it does
not make sense to query each node for the free space of a commonly used
storage.
Masterd will know (through a constant map) which storage type uses which Masterd will know through the mapping of disk templates to storage types which
type for storage calculation (i.e. ``plain`` and ``drbd`` use ``lvm``, storage type uses which mechanism for storage calculation and invoke only the
``file`` uses ``filesystem``, etc) and query the one needed (or all of the needed ones.
needed ones).
Note that for file and sharedfile the node knows which directories are Note that for file and sharedfile the node knows which directories are allowed
allowed and won't allow any other directory to be queried for security and won't allow any other directory to be queried for security reasons. The
reasons. The actual path still needs to be passed to distinguish the actual path still needs to be passed to distinguish the two, as the type will
two, as the type will be the same for both. be the same for both.
These calculations will be implemented in the node storage system These calculations will be implemented in the node storage system
(currently lib/storage.py) but querying will still happen through the (currently lib/storage.py) but querying will still happen through the
...@@ -75,9 +182,9 @@ These calculations will be implemented in the node storage system ...@@ -75,9 +182,9 @@ These calculations will be implemented in the node storage system
Ganeti reporting Ganeti reporting
---------------- ----------------
`gnt-node list`` can be queried for the different storage types, if they `gnt-node list`` can be queried for the different disk templates, if they
are enabled. By default, it will just report information about the default are enabled. By default, it will just report information about the default
storage type. Examples:: disk template. Examples::
> gnt-node list > gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst Node DTotal DFree MTotal MNode MFree Pinst Sinst
...@@ -85,10 +192,10 @@ storage type. Examples:: ...@@ -85,10 +192,10 @@ storage type. Examples::
mynode2 3.6T 3.6T 64.0G 1023M 62.0G 2 1 mynode2 3.6T 3.6T 64.0G 1023M 62.0G 2 1
mynode3 3.6T 3.6T 64.0G 1023M 62.3G 0 2 mynode3 3.6T 3.6T 64.0G 1023M 62.3G 0 2
> gnt-node list -o dtotal/lvm,dfree/rados > gnt-node list -o dtotal/drbd,dfree/file
Node DTotal (Lvm, myvg) DFree (Rados, myrados) Node DTotal (drbd, myvg) DFree (file, mydir)
mynode1 3.6T - mynode1 3.6T -
mynode2 3.6T - mynode2 3.6T -
Note that for drbd, we only report the space of the vg and only if it was not Note that for drbd, we only report the space of the vg and only if it was not
renamed to something different than the default volume group name. With this renamed to something different than the default volume group name. With this
...@@ -97,15 +204,18 @@ restrict the design here to make the transition to storage pools easier (as it ...@@ -97,15 +204,18 @@ restrict the design here to make the transition to storage pools easier (as it
is an interim state only). It is the administrator's responsibility to ensure is an interim state only). It is the administrator's responsibility to ensure
that there is enough space for the meta volume group. that there is enough space for the meta volume group.
When storage pools are implemented, we switch from referencing the storage When storage pools are implemented, we switch from referencing the disk template
type to referencing the storage pool name. For that, of course, the pool to referencing the storage pool name. For that, of course, the pool names need
names need to be unique over all storage types. For drbd, we will use the to be unique over all storage types. For drbd, we will use the default 'drbd'
default 'lvm' storage pool and possibly a second lvm-based storage pool for storage pool and possibly a second lvm-based storage pool for the metavg. It
the metavg. It will be possible to rename storage pools (thus also the default will be possible to rename storage pools (thus also the default lvm storage
lvm storage pool). There will be new functionality to ask about what storage pool). There will be new functionality to ask about what storage pools are
pools are available and of what type. available and of what type. Storage pools will have a storage pool type which is
one of the disk templates. There can be more than one storage pool based on the
``gnt-cluster info`` will report which storage types are enabled, i.e. same disk template, therefore we will then start referencing the storage pool
name instead of the disk template.
``gnt-cluster info`` will report which disk templates are enabled, i.e.
which ones are supported according to the cluster configuration. Example which ones are supported according to the cluster configuration. Example
output:: output::
...@@ -113,25 +223,26 @@ output:: ...@@ -113,25 +223,26 @@ output::
[...] [...]
Cluster parameters: Cluster parameters:
- [...] - [...]
- enabled storage types: plain (default), drbd, lvm, rados - enabled disk templates: plain, drbd, sharedfile, rados
- [...] - [...]
``gnt-node list-storage`` will not be affected by any changes, since this design ``gnt-node list-storage`` will not be affected by any changes, since this design
describes only free storage reporting for non-shared storage types. is restricted only to free storage reporting for non-shared storage types.
Allocator changes Allocator changes
----------------- -----------------
The iallocator protocol doesn't need to change: since we know which The iallocator protocol doesn't need to change: since we know which
storage type an instance has, we'll pass only the "free" value for that disk template an instance has, we'll pass only the "free" value for that
storage type to the iallocator, when asking for an allocation to be disk template to the iallocator, when asking for an allocation to be
made. Note that for DRBD nowadays we ignore the case when vg and metavg made. Note that for DRBD nowadays we ignore the case when vg and metavg
are different, and we only consider the main VG. Fixing this is outside are different, and we only consider the main volume group. Fixing this is
the scope of this design. outside the scope of this design.
With this design, we ensure forward-compatibility with respect to storage With this design, we ensure forward-compatibility with respect to storage
pools. For now, we'll report space for all available (non-shared) storage pools. For now, we'll report space for all available disk templates that
types, in the future, for all available storage pools. are based on non-shared storage types, in the future, for all available
storage pools.
Rebalancing changes Rebalancing changes
------------------- -------------------
...@@ -143,7 +254,7 @@ Space reporting changes ...@@ -143,7 +254,7 @@ Space reporting changes
----------------------- -----------------------
Hspace will by default report by assuming the allocation will happen on Hspace will by default report by assuming the allocation will happen on
the default storage for the cluster/nodegroup. An option will be added the default disk template for the cluster/nodegroup. An option will be added
to manually specify a different storage. to manually specify a different storage.
Interactions with Partitioned Ganeti Interactions with Partitioned Ganeti
...@@ -152,14 +263,15 @@ Interactions with Partitioned Ganeti ...@@ -152,14 +263,15 @@ Interactions with Partitioned Ganeti
Also the design for :doc:`Partitioned Ganeti <design-partitioned>` deals Also the design for :doc:`Partitioned Ganeti <design-partitioned>` deals
with reporting free space. Partitioned Ganeti has a different way to with reporting free space. Partitioned Ganeti has a different way to
report free space for LVM on nodes where the ``exclusive_storage`` flag report free space for LVM on nodes where the ``exclusive_storage`` flag
is set. That doesn't interact directly with this design, as the specific is set. That doesn't interact directly with this design, as the specifics
of how the free space is computed is not in the scope of this design. of how the free space is computed is not in the scope of this design.
But the ``node info`` call contains the value of the But the ``node info`` call contains the value of the
``exclusive_storage`` flag, which is currently only meaningful for the ``exclusive_storage`` flag, which is currently only meaningful for the
LVM back-end. Additional flags like the ``exclusive_storage`` flag LVM storage type. Additional flags like the ``exclusive_storage`` flag
for lvm might be useful for other storage types as well. We therefore for lvm might be useful for other disk templates / storage types as well.
extend the RPC call with <type>,<key> to <type>,<key>,<params> to We therefore extend the RPC call with <disktemplate>,<key> to
include any storage-type specific parameters in the RPC call. <disktemplate>,<key>,<params> to include any disk-template-specific
(or storage-type specific) parameters in the RPC call.
The reporting of free spindles, also part of Partitioned Ganeti, is not The reporting of free spindles, also part of Partitioned Ganeti, is not
concerned with this design doc, as those are seen as a separate resource. concerned with this design doc, as those are seen as a separate resource.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment