Commit b2d72ffe authored by Iustin Pop's avatar Iustin Pop
Browse files

Add iallocator documentation

Reviewed-by: imsnah
parent 66f93869
......@@ -4,8 +4,10 @@ SUBDIRS = examples
dist_doc_DATA = \
hooks.html hooks.pdf \
install.html install.pdf \
admin.html admin.pdf
EXTRA_DIST = hooks.sgml install.sgml admin.sgml
admin.html admin.pdf \
iallocator.html iallocator.pdf
EXTRA_DIST = hooks.sgml install.sgml admin.sgml iallocator.sgml
MAINTAINERCLEANFILES = *.html *.pdf
%.sgmltmp: %.sgml
......
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
]>
<article class="specification">
<articleinfo>
<title>Ganeti automatic instance allocation</title>
</articleinfo>
<para>Documents Ganeti version 1.2</para>
<sect1>
<title>Introduction</title>
<para>Currently in Ganeti the admin has to specify the exact
locations for an instance's node(s). This prevents a completely
automatic node evacuation, and is in general a nuisance.</para>
<para>The <acronym>iallocator</acronym> framework will enable
automatic placement via external scripts, which allows
customization of the cluster layout per the site's
requirements.</para>
</sect1>
<sect1>
<title>User-visible changes</title>
<para>There are two parts of the ganeti operation that are
impacted by the auto-allocation: how the cluster knows what the
allocator algorithms are and how the admin uses these in creating
instances.</para>
<para>An allocation algorithm is just the filename of a program
installed in a defined list of directories.</para>
<sect2>
<title>Cluster configuration</title>
<para>At configure time, the list of the directories can be
selected via the
<option>--with-iallocator-search-path=LIST</option> option,
where <userinput>LIST</userinput> is a comma-separated list of
directories. If not given, this defaults to
<constant>$libdir/ganeti/iallocators</constant>, i.e. for an
installation under <filename class="directory">/usr</filename>,
this will be <filename
class="directory">/usr/lib/ganeti/iallocators</filename>.</para>
<para>Ganeti will then search for allocator script in the
configured list, using the first one whose filename matches the
one given by the user.</para>
</sect2>
<sect2>
<title>Command line interface changes</title>
<para>The node selection options in instanece add and instance
replace disks can be replace by the new <option>--iallocator
<replaceable>NAME</replaceable></option> option, which will
cause the autoassignation. The selected node(s) will be show as
part of the command output.</para>
</sect2>
</sect1>
<sect1>
<title>IAllocator API</title>
<para>The protocol for communication between Ganeti and an
allocator script will be the following:</para>
<orderedlist>
<listitem>
<simpara>ganeti launches the program with a single argument, a
filename that contains a JSON-encoded structure (the input
message)</simpara>
</listitem>
<listitem>
<simpara>if the script finishes with exit code different from
zero, it is considered a general failure and the full output
will be reported to the users; this can be the case when the
allocator can't parse the input message;</simpara>
</listitem>
<listitem>
<simpara>if the allocator finishes with exit code zero, it is
expected to output (on its stdout) a JSON-encoded structure
(the response)</simpara>
</listitem>
</orderedlist>
<sect2>
<title>Input message</title>
<para>The input message will be the JSON encoding of a
dictionary containing the following:</para>
<variablelist>
<varlistentry>
<term>version</term>
<listitem>
<simpara>the version of the protocol; this document
specifies version 1</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>cluster_name</term>
<listitem>
<simpara>the cluster name</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>cluster_tags</term>
<listitem>
<simpara>the list of cluster tags</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>request</term>
<listitem>
<simpara>a dictionary containing the request data:</simpara>
<variablelist>
<varlistentry>
<term>type</term>
<listitem>
<simpara>the request type; this can be either
<literal>allocate</literal> or
<literal>relocate</literal>; the
<literal>allocate</literal> request is used when a
new instance needs to be placed on the cluster,
while the <literal>relocate</literal> request is
used when an existing instance needs to be moved
within the cluster</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>name</term>
<listitem>
<simpara>the name of the instance; if the request is
a realocation, then this name will be found in the
list of instances (see below), otherwise is the
<acronym>FQDN</acronym> of the new
instance</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>required_nodes</term>
<listitem>
<simpara>how many nodes should the algorithm return;
while this information can be deduced from the
instace's disk template, it's better if this
computation is left to Ganeti as then allocator
scripts are less sensitive to changes to the disk
templates</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>disk_space_total</term>
<listitem>
<simpara>the total disk space that will be used by
this instance on the (new) nodes; again, this
information can be computed from the list of
instance disks and its template type, but Ganeti is
better suited to compute it</simpara>
</listitem>
</varlistentry>
</variablelist>
<simpara>If the request is an allocation, then there are
extra fields in the request dictionary:</simpara>
<variablelist>
<varlistentry>
<term>disks</term>
<listitem>
<simpara>list of dictionaries holding the disk
definitions for this instance (in the order they are
exported to the hypervisor):</simpara>
<variablelist>
<varlistentry>
<term>mode</term>
<listitem>
<simpara>either <literal>w</literal> or
<literal>w</literal> denoting if the disk is
read-only or writable; for Ganeti 1.2, this
will always be <literal>w</literal</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>size</term>
<listitem>
<simpara>the size of this disk in mebibyte</simpara>
</listitem>
</varlistentry>
</variablelist>
</listitem>
</varlistentry>
<varlistentry>
<term>nics</term>
<listitem>
<simpara>a list of dictionaries holding the network
interfaces for this instance, containing:</simpara>
<variablelist>
<varlistentry>
<term>ip</term>
<listitem>
<simpara>the IP address that Ganeti know for
this instance, or null</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>mac</term>
<listitem>
<simpara>the MAC address for this interface</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>bridge</term>
<listitem>
<simpara>the bridge to which this interface
will be connected</simpara>
</listitem>
</varlistentry>
</variablelist>
</listitem>
</varlistentry>
<varlistentry>
<term>vcpus</term>
<listitem>
<simpara>the number of VCPUs for the instance</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>disk_template</term>
<listitem>
<simpara>the disk template for the instance</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>memory</term>
<listitem>
<simpara>the memory size for the instance</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>os</term>
<listitem>
<simpara>the OS type for the instance</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>tags</term>
<listitem>
<simpara>the list of the instance's tags</simpara>
</listitem>
</varlistentry>
</variablelist>
<simpara>If the request is of type relocate, then there is
one more entry in the request dictionary, named
<varname>relocate_from</varname>, and it contains a list
of nodes to move the instance away from; note that with
Ganeti 1.2, this list will always contain a single node,
the current secondary of the instance.</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>instances</term>
<listitem>
<simpara>a dictionary with the data for the current
existing instance on the cluster, indexed by instance
name; the contents are similar to the instance definitions
for the allocate mode, with the addition of:</simpara>
<variablelist>
<varlistentry>
<term>should_run</term>
<listitem>
<simpara>if this instance is set to run (but not the
actual status of the instance)</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>nodes</term>
<listitem>
<simpara>list of nodes on which this instance is
placed; the primary node of the instance is always
the first one</simpara>
</listitem>
</varlistentry>
</variablelist>
</listitem>
</varlistentry>
<varlistentry>
<term>nodes</term>
<listitem>
<simpara>dictionary with the data for the nodes in the
cluster, indexed by the node name; the dict
contains:</simpara>
<variablelist>
<varlistentry>
<term>total_disk</term>
<listitem>
<simpara>the total disk size of this node
(mebibytes)</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>free_disk</term>
<listitem>
<simpara>the free disk space on the node</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>total_memory</term>
<listitem>
<simpara>the total memory size</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>free_memory</term>
<listitem>
<simpara>free memory on the node; note that
currently this does not take into account the
instances which are down on the node</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>primary_ip</term>
<listitem>
<simpara>the primary IP address of the
node</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>secondary_ip</term>
<listitem>
<simpara>the secondary IP address of the node (the
one used for the DRBD replication); note that this
can be the same as the primary one</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>tags</term>
<listitem>
<simpara>list with the tags of the node</simpara>
</listitem>
</varlistentry>
</variablelist>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2>
<title>Respone message</title>
<para>The response message is much more simple than the input
one. It is also a dict having three keys:</para>
<variablelist>
<varlistentry>
<term>success</term>
<listitem>
<simpara>a boolean value denoting if the allocation was
successfull or not</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>info</term>
<listitem>
<simpara>a string with information from the scripts; if
the allocation fails, this will be shown to the
user</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term>nodes</term>
<listitem>
<simpara>the list of nodes computed by the algorithm; even
if the algorithm failed (i.e. success is false), this must
be returned as an empty list; also note that the length of
this list must equal the
<varname>requested_nodes</varname> entry in the input
message, otherwise Ganeti will consider the result as
failed</simpara>
</listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>
<sect1>
<title>Examples</title>
<sect2>
<title>Input messages to scripts</title>
<simpara>Input message, new instance allocation</simpara>
<screen>
{
"cluster_tags": [],
"request": {
"required_nodes": 2,
"name": "instance3.example.com",
"tags": [
"type:test",
"owner:foo"
],
"type": "allocate",
"disks": [
{
"mode": "w",
"size": 1024
},
{
"mode": "w",
"size": 2048
}
],
"nics": [
{
"ip": null,
"mac": "00:11:22:33:44:55",
"bridge": null
}
],
"vcpus": 1,
"disk_template": "drbd",
"memory": 2048,
"disk_space_total": 3328,
"os": "etch-image"
},
"cluster_name": "cluster1.example.com",
"instances": {
"instance1.example.com": {
"tags": [],
"should_run": false,
"disks": [
{
"mode": "w",
"size": 64
},
{
"mode": "w",
"size": 512
}
],
"nics": [
{
"ip": null,
"mac": "aa:00:00:00:60:bf",
"bridge": "xen-br0"
}
],
"vcpus": 1,
"disk_template": "plain",
"memory": 128,
"nodes": [
"nodee1.com"
],
"os": "etch-image"
},
"instance2.example.com": {
"tags": [],
"should_run": false,
"disks": [
{
"mode": "w",
"size": 512
},
{
"mode": "w",
"size": 256
}
],
"nics": [
{
"ip": null,
"mac": "aa:00:00:55:f8:38",
"bridge": "xen-br0"
}
],
"vcpus": 1,
"disk_template": "drbd",
"memory": 512,
"nodes": [
"node2.example.com",
"node3.example.com"
],
"os": "etch-image"
}
},
"version": 1,
"nodes": {
"node1.example.com": {
"total_disk": 858276,
"primary_ip": "192.168.1.1",
"secondary_ip": "192.168.2.1",
"tags": [],
"free_memory": 3505,
"free_disk": 856740,
"total_memory": 4095
},
"node2.example.com": {
"total_disk": 858240,
"primary_ip": "192.168.1.3",
"secondary_ip": "192.168.2.3",
"tags": ["test"],
"free_memory": 3505,
"free_disk": 848320,
"total_memory": 4095
},
"node3.example.com.com": {
"total_disk": 572184,
"primary_ip": "192.168.1.3",
"secondary_ip": "192.168.2.3",
"tags": [],
"free_memory": 3505,
"free_disk": 570648,
"total_memory": 4095
}
}
}
</screen>
<simpara>Input message, reallocation. Since only the request
entry in the input message is changed, the following shows only
this entry:</simpara>
<screen>
"request": {
"relocate_from": [
"node3.example.com"
],
"required_nodes": 1,
"type": "relocate",
"name": "instance2.example.com",
"disk_space_total": 832
},
</screen>
</sect2>
<sect2>
<title>Response messages</title>
<simpara>Successful response message:</simpara>
<screen>
{
"info": "Allocation successful",
"nodes": [
"node2.example.com",
"node1.example.com"
],
"success": true
}
</screen>
<simpara>Failed response message:</simpara>
<screen>
{
"info": "Can't find a suitable node for position 2 (already selected: node2.example.com)",
"nodes": [],
"success": false
}
</screen>
</sect2>
<sect2>
<title>Command line messages</title>
<screen>
# gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3
Selected nodes for the instance: node1.example.com
* creating instance disks...
[...]
# gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4
Failure: prerequisites not met for this operation:
Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: )
# gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5
Failure: prerequisites not met for this operation:
Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com)
</screen>
</sect2>
</sect1>
</article>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment