design-disk-conversion.rst 12.7 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281
=================================
Conversion between disk templates
=================================

.. contents:: :depth: 4

This design document describes the support for generic disk template
conversion in Ganeti. The logic used is disk template agnostic and
targets to cover the majority of conversions among the supported disk
templates.


Current state and shortcomings
==============================

Currently, Ganeti supports choosing among different disk templates when
creating an instance. However, converting the disk template of an
existing instance is possible only between the ``plain`` and ``drbd``
templates. This feature was added in Ganeti since its early versions
when the number of supported disk templates was limited. Now that Ganeti
supports plenty of choices, this feature should be extended to provide
more flexibility to the user.

The procedure for converting from the plain to the drbd disk template
works as follows. Firstly, a completely new disk template is generated
matching the size, mode, and the count of the current instance's disks.
The missing volumes are created manually both in the primary (meta disk)
and the secondary node. The original LVs running on the primary node are
renamed to match the new names. The last step is to manually associate
the DRBD devices with their mirror block device pairs. The conversion
from the drbd to the plain disk template is much simpler than the
opposite. Firstly, the DRBD mirroring is manually disabled. Then the
unnecessary volumes including the meta disk(s) of the primary node, and
the meta and data disk(s) from the previously secondary node are
removed.


Proposed changes
================

This design proposes the creation of a unified interface for handling
the disk template conversions in Ganeti. Currently, there is no such
interface and each one of the supported conversions uses a separate code
path.

This proposal introduces a single, disk-agnostic interface for handling
the disk template conversions in Ganeti, keeping in mind that we want it
to be as generic as possible. An exception case will be the currently
supported conversions between the LVM-based disk templates. Their basic
functionality will not be affected and will diverge from the rest disk
template conversions. The target is to provide support for conversions
among the majority of the available disk templates, and also creating
a mechanism that will easily support any new templates that may be
probably added in Ganeti, at a future point.


Design decisions
================

Currently, the supported conversions for the LVM-based templates are
handled by the ``LUInstanceSetParams`` LU. Our implementation will
follow the same approach. From a high-level point-of-view this design
can be split in two parts:

* The extension of the LU's checks to cover all the supported template
  conversions

* The new functionality which will be introduced to provide the new
  feature

The instance must be stopped before starting the disk template
conversion, as it currently is, otherwise the operation will fail. The
new mechanism will need to copy the disk's data for the conversion to be
possible. We propose using the Unix ``dd`` command to copy the
instance's data. It can be used to copy data from source to destination,
block-by-block, regardless of their filesystem types, making it a
convenient tool for the case. Since the conversion will be done via data
copy it will take a long time for bigger disks to copy their data and
consequently for the instance to switch to the new template.

Some template conversions can be done faster without copying explicitly
their disks' data. A use case is the conversions between the LVM-based
templates, i.e., ``drbd`` and ``plain`` which will be done as happens
now and not using the ``dd`` command. Also, this implementation will
provide partial support for the ``blockdev`` disk template which will
act only as a source template. Since those volumes are adopted
pre-existent block devices we will not support conversions targeting
this template. Another exception case will be the ``diskless`` template.
Since it is a testing template that creates instances with no disks we
will not provide support for conversions that include this template
type.


We divide the design into the following parts:

* Block device changes, that include the new methods which will be
  introduced and will be responsible for building the commands for the
  data copy from/to the requested devices

* Backend changes, that include a new RPC call which will concatenate
  the output of the above two methods and will execute the data copy
  command

* Core changes, that include the modifications in the Logical Unit

* User interface changes, i.e., command line changes


Block device changes
--------------------

The block device abstract class will be extended with two new methods,
named ``Import`` and ``Export``. Those methods will be responsible for
building the commands that will be used for the data copy between the
corresponding devices. The ``Export`` method will build the command
which will export the data from the source device, while the ``Import``
method will do the opposite. It will import the data to the newly
created target device. Those two methods will not perform the actual
data copy; they will simply return the requested commands for
transferring the data from/to the individual devices. The output of the
two methods will be combined using a pipe ("|") by the caller method in
the backend level.

By default the data import and export will be done using the ``dd``
command. All the inherited classes will use the base functionality
unless there is a faster way to convert to. In that case the underlying
block device will overwrite those methods with its specific
functionality. A use case will be the Ceph/RADOS block devices which
will make use of the ``rbd import`` and ``rbd export`` commands to copy
their data instead of using the default ``dd`` command.

Keeping the data copy functionality in the block device layer, provides
us with a generic mechanism that works between almost all conversions
and furthermore can be easily extended for new disk templates. It also
covers the devices that support the ``access=userspace`` parameter and
solves this problem in a generic way, by implementing the logic in the
right level where we know what is the best to do for each device.


Backend changes
---------------

Introduce a new RPC call:

* blockdev_convert(src_disk, dest_disk)

where ``src_disk`` and ``dest_disk`` are the original and the new disk
objects respectively. First, the actual device instances will be
computed and then they will be used to build the export and import
commands for the data copy. The output of those methods will be
concatenated using a pipe, following a similar approach with the impexp
daemon. Finally, the unified data copy command will be executed, at this
level, by the ``nodeD``.


Core changes
------------

The main modifications will be made in the ``LUInstanceSetParams`` LU.
The implementation of the conversion mechanism will be split into the
following parts:

* The generation of the new disk template for the instance. The new
  disks will match the size, mode, and name of the original volumes.
  Those parameters and any other needed, .i.e., the provider's name for
  the ExtStorage conversions, will be computed by a new method which we
  will introduce, named ``ComputeDisksInfo``. The output of that
  function will be used as the ``disk_info`` argument of the
  ``GenerateDiskTemplate`` method.

* The creation of the new block devices. We will make use of the
  ``CreateDisks`` method which creates and attaches the new block
  devices.

* The data copy for each disk of the instance from the original to the
  newly created volume. The data copy will be made by the ``nodeD`` with
  the rpc call we have introduced earlier in this design. In case some
  disks fail to copy their data the operation will fail and the newly
  created disks will be removed. The instance will remain intact.

* The detachment of the original disks of the instance when the data
  copy operation successfully completes by calling the
  ``RemoveInstanceDisk`` method for each instance's disk.

* The attachment of the new disks to the instance by calling the
  ``AddInstanceDisk`` method for each disk we have created.

* The update of the configuration file with the new values.

* The removal of the original block devices from the node using the
  ``BlockdevRemove`` method for each one of the old disks.


User interface changes
----------------------

The ``-t`` (``--disk-template``) option from the gnt-instance modify
command will specify the disk template to convert *to*, as it happens
now. The rest disk options such as its size, its mode, and its name will
be computed from the original volumes by the conversion mechanism, and
the user will not explicitly provide them.


ExtStorage conversions
~~~~~~~~~~~~~~~~~~~~~~

When converting to an ExtStorage disk template the
``provider=*PROVIDER*`` option which specifies the ExtStorage provider
will be mandatory. Also, arbitrary parameters can be passed to the
ExtStorage provider. Those parameters will be optional and could be
passed as additional comma separated options. Since it is not allowed to
convert the disk template of an instance and make use of the ``--disk``
option at the same time, we propose to introduce a new option named
``--ext-params`` to handle the ``ext`` template conversions.

::

  gnt-instance modify -t ext --ext-params provider=pvdr1 test_vm
  gnt-instance modify -t ext --ext-params provider=pvdr1,param1=val1,param2=val2 test_vm


File-based conversions
~~~~~~~~~~~~~~~~~~~~~~

For conversions *to* a file-based template the ``--file-storage-dir``
and the ``--file-driver`` options could be used, similarly to the
**add** command, to manually configure the storage directory and the
preferred driver for the file-based disks.

::

  gnt-instance modify -t file --file-storage-dir=mysubdir test_vm


Supported template conversions
==============================

This is a summary of the disk template conversions that the conversion
mechanism will support:

+--------------+-----------------------------------------------------------------------------------+
| Source       |                                 Target Disk Template                              |
| Disk         +---------+-------+------+------------+---------+------+------+----------+----------+
| Template     |  Plain  |  DRBD | File | Sharedfile | Gluster | RBD  | Ext  | BlockDev | Diskless |
+==============+=========+=======+======+============+=========+======+======+==========+==========+
| Plain        |    -    |  Yes. | Yes. |    Yes.    |   Yes.  | Yes. | Yes. |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| DRBD         |   Yes.  |   -   | Yes. |    Yes.    |   Yes.  | Yes. | Yes. |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| File         |   Yes.  |  Yes. |   -  |    Yes.    |   Yes.  | Yes. | Yes. |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Sharedfile   |   Yes.  |  Yes. | Yes. |     -      |   Yes.  | Yes. | Yes. |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Gluster      |   Yes.  |  Yes. | Yes. |    Yes.    |    -    | Yes. | Yes. |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| RBD          |   Yes.  |  Yes. | Yes. |    Yes.    |   Yes.  |  -   | Yes. |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Ext          |   Yes.  |  Yes. | Yes. |    Yes.    |   Yes.  | Yes. |  -   |    No.   |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| BlockDev     |   Yes.  |  Yes. | Yes. |    Yes.    |   Yes.  | Yes. | Yes. |     -    |   No.    |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+
| Diskless     |   No.   |  No.  | No.  |    No.     |   No.   | No.  | No.  |    No.   |    -     |
+--------------+---------+-------+------+------------+---------+------+------+----------+----------+


Future Work
===========

Expand the conversion mechanism to provide a visual indication of the
data copy operation. We could monitor the progress of the data sent via
a pipe, and provide to the user information such as the time elapsed,
percentage completed (probably with a progress bar), total data
transferred, and so on, similar to the progress tracking that is
currently done by the impexp daemon.


.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: