diff --git a/INSTALL b/INSTALL index a3bc7a73cd9ee0b71a1da30d2cc7c0af91b9dede..767c16034a49185a666ee45131bf41e06ce20eab 100644 --- a/INSTALL +++ b/INSTALL @@ -19,6 +19,8 @@ Before installing, please verify that you have the following programs: versions 0.11.X or above have shown good behavior). - `DRBD <http://www.drbd.org/>`_, kernel module and userspace utils, version 8.0.7 or above +- `RBD <http://ceph.newdream.net/>`_, kernel modules (rbd.ko/libceph.ko) + and userspace utils (ceph-common) - `LVM2 <http://sourceware.org/lvm2/>`_ - `OpenSSH <http://www.openssh.com/portable.html>`_ - `bridge utilities <http://www.linuxfoundation.org/en/Net:Bridge>`_ @@ -50,7 +52,7 @@ These programs are supplied as part of most Linux distributions, so usually they can be installed via the standard package manager. Also many of them will already be installed on a standard machine. On Debian/Ubuntu, you can use this command line to install all required -packages, except for DRBD and Xen:: +packages, except for RBD, DRBD and Xen:: $ apt-get install lvm2 ssh bridge-utils iproute iputils-arping \ ndisc6 python python-pyopenssl openssl \ diff --git a/doc/admin.rst b/doc/admin.rst index 0f910ef1a1b071aadcc7f7b1abb93e82025890b6..c9cbd96c01d453525f8488a73959732d55629807 100644 --- a/doc/admin.rst +++ b/doc/admin.rst @@ -115,7 +115,7 @@ The are multiple options for the storage provided to an instance; while the instance sees the same virtual drive in all cases, the node-level configuration varies between them. -There are four disk templates you can choose from: +There are five disk templates you can choose from: diskless The instance has no disks. Only used for special purpose operating @@ -138,6 +138,10 @@ drbd to obtain a highly available instance that can be failed over to a remote node should the primary one fail. +rbd + The instance will use Volumes inside a RADOS cluster as backend for its + disks. It will access them using the RADOS block device (RBD). + IAllocator ~~~~~~~~~~ @@ -510,6 +514,13 @@ The instance will be started with an amount of memory between its target node, or the operation will fail if that's not possible. See :ref:`instance-startup-label` for details. +If the instance's disk template is of type rbd, then you can specify +the target node (which can be any node) explicitly, or specify an +iallocator plugin. If you omit both, the default iallocator will be +used to determine the target node:: + + gnt-instance failover -n TARGET_NODE INSTANCE_NAME + Live migrating an instance ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -530,6 +541,13 @@ migrating it, unless the ``--no-runtime-changes`` option is passed, in which case the target node should have at least the instance's current runtime memory free. +If the instance's disk template is of type rbd, then you can specify +the target node (which can be any node) explicitly, or specify an +iallocator plugin. If you omit both, the default iallocator will be +used to determine the target node:: + + gnt-instance migrate -n TARGET_NODE INSTANCE_NAME + Moving an instance (offline) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1247,6 +1265,10 @@ of a cluster installation by following these steps on all of the nodes: 6. Remove the ganeti state directory (``rm -rf /var/lib/ganeti/*``), replacing the path with the correct path for your installation. +7. If using RBD, run ``rbd unmap /dev/rbdN`` to unmap the RBD disks. + Then remove the RBD disk images used by Ganeti, identified by their + UUIDs (``rbd rm uuid.rbd.diskN``). + On the master node, remove the cluster from the master-netdev (usually ``xen-br0`` for bridged mode, otherwise ``eth0`` or similar), by running ``ip a del $clusterip/32 dev xen-br0`` (use the correct cluster ip and diff --git a/doc/iallocator.rst b/doc/iallocator.rst index 57a4388d631dee4e360cba293142022246a1cfae..723b948d740837483bf5b3b66739d6f55573f2b3 100644 --- a/doc/iallocator.rst +++ b/doc/iallocator.rst @@ -41,7 +41,7 @@ using the first one whose filename matches the one given by the user. Command line interface changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The node selection options in instanece add and instance replace disks +The node selection options in instance add and instance replace disks can be replace by the new ``--iallocator=NAME`` option (shortened to ``-I``), which will cause the auto-assignement of nodes with the passed iallocator. The selected node(s) will be show as part of the diff --git a/doc/install.rst b/doc/install.rst index d899abda427b7d03bd17d8d5538b6318391aed16..d46cf8e2f4d00460c7771957d2d296cf1ab2c4a4 100644 --- a/doc/install.rst +++ b/doc/install.rst @@ -69,6 +69,10 @@ all Ganeti features. The volume group name Ganeti uses (by default) is You can also use file-based storage only, without LVM, but this setup is not detailed in this document. +If you choose to use RBD-based instances, there's no need for LVM +provisioning. However, this feature is experimental, and is not +recommended for production clusters. + While you can use an existing system, please note that the Ganeti installation is intrusive in terms of changes to the system configuration, and it's best to use a newly-installed system without @@ -300,6 +304,88 @@ instances on a node. } } +Installing RBD ++++++++++++++++ + +Recommended on all nodes: RBD_ is required if you want to create +instances with RBD disks residing inside a RADOS cluster (make use of +the rbd disk template). RBD-based instances can failover or migrate to +any other node in the ganeti cluster, enabling you to exploit of all +Ganeti's high availabilily (HA) features. + +.. attention:: + Be careful though: rbd is still experimental! For now it is + recommended only for testing purposes. No sensitive data should be + stored there. + +.. _RBD: http://ceph.newdream.net/ + +You will need the ``rbd`` and ``libceph`` kernel modules, the RBD/Ceph +userspace utils (ceph-common Debian package) and an appropriate +Ceph/RADOS configuration file on every VM-capable node. + +You will also need a working RADOS Cluster accessible by the above +nodes. + +RADOS Cluster +~~~~~~~~~~~~~ + +You will need a working RADOS Cluster accesible by all VM-capable nodes +to use the RBD template. For more information on setting up a RADOS +Cluster, refer to the `official docs <http://ceph.newdream.net/>`_. + +If you want to use a pool for storing RBD disk images other than the +default (``rbd``), you should first create the pool in the RADOS +Cluster, and then set the corresponding rbd disk parameter named +``pool``. + +Kernel Modules +~~~~~~~~~~~~~~ + +Unless your distribution already provides it, you might need to compile +the ``rbd`` and ``libceph`` modules from source. You will need Linux +Kernel 3.2 or above for the kernel modules. Alternatively you will have +to build them as external modules (from Linux Kernel source 3.2 or +above), if you want to run a less recent kernel, or your kernel doesn't +include them. + +Userspace Utils +~~~~~~~~~~~~~~~ + +The RBD template has been tested with ``ceph-common`` v0.38 and +above. We recommend using the latest version of ``ceph-common``. + +.. admonition:: Debian + + On Debian, you can just install the RBD/Ceph userspace utils with + the following command:: + + apt-get install ceph-common + +Configuration file +~~~~~~~~~~~~~~~~~~ + +You should also provide an appropriate configuration file +(``ceph.conf``) in ``/etc/ceph``. For the rbd userspace utils, you'll +only need to specify the IP addresses of the RADOS Cluster monitors. + +.. admonition:: ceph.conf + + Sample configuration file:: + + [mon.a] + host = example_monitor_host1 + mon addr = 1.2.3.4:6789 + [mon.b] + host = example_monitor_host2 + mon addr = 1.2.3.5:6789 + [mon.c] + host = example_monitor_host3 + mon addr = 1.2.3.6:6789 + +For more information, please see the `Ceph Docs +<http://ceph.newdream.net/docs/latest/>`_ + Other required software +++++++++++++++++++++++ diff --git a/man/gnt-cluster.rst b/man/gnt-cluster.rst index 81a05b92bd6a41395c7017dc4507238d8129d4c0..cfd994f93ee70d598cb4b98f4bd05a01b7104d30 100644 --- a/man/gnt-cluster.rst +++ b/man/gnt-cluster.rst @@ -445,6 +445,13 @@ List of parameters available for the **plain** template: stripes Number of stripes to use for new LVs. +List of parameters available for the **rbd** template: + +pool + The RADOS cluster pool, inside which all rbd volumes will reside. + When a new RADOS cluster is deployed, the default pool to put rbd + volumes (Images in RADOS terminology) is 'rbd'. + The option ``--maintain-node-health`` allows one to enable/disable automatic maintenance actions on nodes. Currently these include automatic shutdown of instances and deactivation of DRBD devices on diff --git a/man/gnt-instance.rst b/man/gnt-instance.rst index 2102452a34438cf71068f7660be6934205926347..09f3105fe2d951db9766168b71bbae09858f522b 100644 --- a/man/gnt-instance.rst +++ b/man/gnt-instance.rst @@ -27,7 +27,7 @@ ADD ^^^ | **add** -| {-t|--disk-template {diskless | file \| plain \| drbd}} +| {-t|--disk-template {diskless | file \| plain \| drbd \| rbd}} | {--disk=*N*: {size=*VAL* \| adopt=*LV*}[,vg=*VG*][,metavg=*VG*][,mode=*ro\|rw*] | \| {-s|--os-size} *SIZE*} | [--no-ip-check] [--no-name-check] [--no-start] [--no-install] @@ -588,6 +588,9 @@ plain drbd Disk devices will be drbd (version 8.x) on top of lvm volumes. +rbd + Disk devices will be rbd volumes residing inside a RADOS cluster. + The optional second value of the ``-n (--node)`` is used for the drbd template type and specifies the remote node. @@ -1321,7 +1324,7 @@ GROW-DISK {*amount*} Grows an instance's disk. This is only possible for instances having a -plain or drbd disk template. +plain, drbd or rbd disk template. Note that this command only change the block device size; it will not grow the actual filesystems, partitions, etc. that live on that @@ -1341,10 +1344,10 @@ amount to increase the disk with in mebibytes) or can be given similar to the arguments in the create instance operation, with a suffix denoting the unit. -Note that the disk grow operation might complete on one node but fail -on the other; this will leave the instance with different-sized LVs on -the two nodes, but this will not create problems (except for unused -space). +For instances with a drbd template, note that the disk grow operation +might complete on one node but fail on the other; this will leave the +instance with different-sized LVs on the two nodes, but this will not +create problems (except for unused space). If you do not want gnt-instance to wait for the new disk region to be synced, use the ``--no-wait-for-sync`` option. @@ -1401,16 +1404,25 @@ Recovery FAILOVER ^^^^^^^^ -**failover** [-f] [--ignore-consistency] [--shutdown-timeout=*N*] -[--submit] [--ignore-ipolicy] {*instance*} +| **failover** [-f] [--ignore-consistency] [--ignore-ipolicy] +| [--shutdown-timeout=*N*] +| [{-n|--target-node} *node* \| {-I|--iallocator} *name*] +| [--submit] +| {*instance*} Failover will stop the instance (if running), change its primary node, and if it was originally running it will start it again (on the new primary). This only works for instances with drbd template (in which case you can only fail to the secondary node) and for externally -mirrored templates (shared storage) (which can change to any other +mirrored templates (blockdev and rbd) (which can change to any other node). +If the instance's disk template is of type blockdev or rbd, then you +can explicitly specify the target node (which can be any node) using +the ``-n`` or ``--target-node`` option, or specify an iallocator plugin +using the ``-I`` or ``--iallocator`` option. If you omit both, the default +iallocator will be used to specify the target node. + Normally the failover will check the consistency of the disks before failing over the instance. If you are trying to migrate instances off a dead node, this will fail. Use the ``--ignore-consistency`` option @@ -1443,11 +1455,19 @@ MIGRATE **migrate** [-f] [--allow-failover] [--non-live] [--migration-mode=live\|non-live] [--ignore-ipolicy] -[--no-runtime-changes] {*instance*} - -Migrate will move the instance to its secondary node without -shutdown. It only works for instances having the drbd8 disk template -type. +[--no-runtime-changes] +[{-n|--target-node} *node* \| {-I|--iallocator} *name*] {*instance*} + +Migrate will move the instance to its secondary node without shutdown. +As with failover, it only works for instances having the drbd disk +template or an externally mirrored disk template type such as blockdev +or rbd. + +If the instance's disk template is of type blockdev or rbd, then you can +explicitly specify the target node (which can be any node) using the +``-n`` or ``--target-node`` option, or specify an iallocator plugin +using the ``-I`` or ``--iallocator`` option. If you omit both, the +default iallocator will be used to specify the target node. The migration command needs a perfectly healthy instance, as we rely on the dual-master capability of drbd8 and the disks of the instance