-
Constantinos Venetsanopoulos authored
Small updates to the extstorage design document and interface. Signed-off-by:
Constantinos Venetsanopoulos <cven@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
855f9bad
Ganeti shared storage support for 2.3+
This document describes the changes in Ganeti 2.3+ compared to Ganeti 2.3 storage model.
Contents
- Objective
- Background
- Use cases
- Design Overview
- Refactoring of all code referring to constants.DTS_NET_MIRROR
- Obsolescence of the primary-secondary node model
- Introduction of a shared file disk template
- Introduction of a shared block device template
- Introduction of the External Storage Interface
- Long-term shared storage goals
Objective
The aim is to introduce support for externally mirrored, shared storage. This includes two distinct disk templates:
- A shared filesystem containing instance disks as regular files typically residing on a networked or cluster filesystem (e.g. NFS, AFS, Ceph, OCFS2, etc.).
- Instance images being shared block devices, typically LUNs residing on a SAN appliance.
Background
DRBD is currently the only shared storage backend supported by Ganeti. DRBD offers the advantages of high availability while running on commodity hardware at the cost of high network I/O for block-level synchronization between hosts. DRBD's master-slave model has greatly influenced Ganeti's design, primarily by introducing the concept of primary and secondary nodes and thus defining an instance's “mobility domain”.
Although DRBD has many advantages, many sites choose to use networked storage appliances for Virtual Machine hosting, such as SAN and/or NAS, which provide shared storage without the administrative overhead of DRBD nor the limitation of a 1:1 master-slave setup. Furthermore, new distributed filesystems such as Ceph are becoming viable alternatives to expensive storage appliances. Support for both modes of operation, i.e. shared block storage and shared file storage backend would make Ganeti a robust choice for high-availability virtualization clusters.
Throughout this document, the term “externally mirrored storage” will refer to both modes of shared storage, suggesting that Ganeti does not need to take care about the mirroring process from one host to another.
Use cases
We consider the following use cases:
- A virtualization cluster with FibreChannel shared storage, mapping at least one LUN per instance, accessible by the whole cluster.
- A virtualization cluster with instance images stored as files on an NFS server.
- A virtualization cluster storing instance images on a Ceph volume.
Design Overview
The design addresses the following procedures:
- Refactoring of all code referring to constants.DTS_NET_MIRROR.
- Obsolescence of the primary-secondary concept for externally mirrored storage.
- Introduction of a shared file storage disk template for use with networked filesystems.
- Introduction of a shared block device disk template with device adoption.
- Introduction of the External Storage Interface.
Additionally, mid- to long-term goals include:
- Support for external “storage pools”.
Refactoring of all code referring to constants.DTS_NET_MIRROR
Currently, all storage-related decision-making depends on a number of frozensets in lib/constants.py, typically constants.DTS_NET_MIRROR. However, constants.DTS_NET_MIRROR is used to signify two different attributes:
- A storage device that is shared
- A storage device whose mirroring is supervised by Ganeti
We propose the introduction of two new frozensets to ease decision-making:
- constants.DTS_EXT_MIRROR, holding externally mirrored disk templates
- constants.DTS_MIRRORED, being a union of constants.DTS_EXT_MIRROR and DTS_NET_MIRROR.
Additionally, DTS_NET_MIRROR will be renamed to DTS_INT_MIRROR to reflect the status of the storage as internally mirrored by Ganeti.
Thus, checks could be grouped into the following categories:
- Mobility checks, like whether an instance failover or migration is possible should check against constants.DTS_MIRRORED
- Syncing actions should be performed only for templates in constants.DTS_NET_MIRROR
Obsolescence of the primary-secondary node model
The primary-secondary node concept has primarily evolved through the use of DRBD. In a globally shared storage framework without need for external sync (e.g. SAN, NAS, etc.), such a notion does not apply for the following reasons:
- Access to the storage does not necessarily imply different roles for the nodes (e.g. primary vs secondary).
- The same storage is available to potentially more than 2 nodes. Thus, an instance backed by a SAN LUN for example may actually migrate to any of the other nodes and not just a pre-designated failover node.
The proposed solution is using the iallocator framework for run-time decision making during migration and failover, for nodes with disk templates in constants.DTS_EXT_MIRROR. Modifications to gnt-instance and gnt-node will be required to accept target node and/or iallocator specification for these operations. Modifications of the iallocator protocol will be required to address at least the following needs:
- Allocation tools must be able to distinguish between internal and external storage
- Migration/failover decisions must take into account shared storage availability
Introduction of a shared file disk template
Basic shared file storage support can be implemented by creating a new disk template based on the existing FileStorage class, with only minor modifications in lib/bdev.py. The shared file disk template relies on a shared filesystem (e.g. NFS, AFS, Ceph, OCFS2 over SAN or DRBD) being mounted on all nodes under the same path, where instance images will be saved.
A new cluster initialization option is added to specify the mountpoint of the shared filesystem.
The remainder of this document deals with shared block storage.
Introduction of a shared block device template
Basic shared block device support will be implemented with an additional disk template. This disk template will not feature any kind of storage control (provisioning, removal, resizing, etc.), but will instead rely on the adoption of already-existing block devices (e.g. SAN LUNs, NBD devices, remote iSCSI targets, etc.).
The shared block device template will make the following assumptions:
- The adopted block device has a consistent name across all nodes, enforced e.g. via udev rules.
- The device will be available with the same path under all nodes in the node group.