Add iallocator documentation

Reviewed-by: imsnah

Add iallocator documentation
Reviewed-by: imsnah
b2d72ffe · Iustin Pop · 66f93869 · b2d72ffe · b2d72ffe
Commit b2d72ffe authored 16 years ago by Iustin Pop
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -4,8 +4,10 @@ SUBDIRS = examples
 dist_doc_DATA = \
  hooks.html hooks.pdf \
  install.html install.pdf \
-  admin.html admin.pdf
-EXTRA_DIST = hooks.sgml install.sgml admin.sgml
+  admin.html admin.pdf \
+  iallocator.html iallocator.pdf
+
+EXTRA_DIST = hooks.sgml install.sgml admin.sgml iallocator.sgml
 MAINTAINERCLEANFILES = *.html *.pdf

 %.sgmltmp: %.sgml

--- a/doc/iallocator.sgml
+++ b/doc/iallocator.sgml
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
+]>
+  <article class="specification">
+  <articleinfo>
+    <title>Ganeti automatic instance allocation</title>
+  </articleinfo>
+  <para>Documents Ganeti version 1.2</para>
+  <sect1>
+    <title>Introduction</title>
+
+    <para>Currently in Ganeti the admin has to specify the exact
+    locations for an instance's node(s). This prevents a completely
+    automatic node evacuation, and is in general a nuisance.</para>
+
+    <para>The <acronym>iallocator</acronym> framework will enable
+    automatic placement via external scripts, which allows
+    customization of the cluster layout per the site's
+    requirements.</para>
+
+  </sect1>
+
+  <sect1>
+    <title>User-visible changes</title>
+
+    <para>There are two parts of the ganeti operation that are
+    impacted by the auto-allocation: how the cluster knows what the
+    allocator algorithms are and how the admin uses these in creating
+    instances.</para>
+
+    <para>An allocation algorithm is just the filename of a program
+    installed in a defined list of directories.</para>
+
+    <sect2>
+      <title>Cluster configuration</title>
+
+      <para>At configure time, the list of the directories can be
+      selected via the
+      <option>--with-iallocator-search-path=LIST</option> option,
+      where <userinput>LIST</userinput> is a comma-separated list of
+      directories. If not given, this defaults to
+      <constant>$libdir/ganeti/iallocators</constant>, i.e. for an
+      installation under <filename class="directory">/usr</filename>,
+      this will be <filename
+      class="directory">/usr/lib/ganeti/iallocators</filename>.</para>
+
+      <para>Ganeti will then search for allocator script in the
+      configured list, using the first one whose filename matches the
+      one given by the user.</para>
+
+    </sect2>
+
+    <sect2>
+      <title>Command line interface changes</title>
+
+      <para>The node selection options in instanece add and instance
+      replace disks can be replace by the new <option>--iallocator
+      <replaceable>NAME</replaceable></option> option, which will
+      cause the autoassignation. The selected node(s) will be show as
+      part of the command output.</para>
+
+    </sect2>
+
+  </sect1>
+
+  <sect1>
+    <title>IAllocator API</title>
+
+    <para>The protocol for communication between Ganeti and an
+    allocator script will be the following:</para>
+
+    <orderedlist>
+      <listitem>
+        <simpara>ganeti launches the program with a single argument, a
+        filename that contains a JSON-encoded structure (the input
+        message)</simpara>
+      </listitem>
+      <listitem>
+        <simpara>if the script finishes with exit code different from
+        zero, it is considered a general failure and the full output
+        will be reported to the users; this can be the case when the
+        allocator can't parse the input message;</simpara>
+      </listitem>
+      <listitem>
+        <simpara>if the allocator finishes with exit code zero, it is
+        expected to output (on its stdout) a JSON-encoded structure
+        (the response)</simpara>
+      </listitem>
+    </orderedlist>
+
+    <sect2>
+      <title>Input message</title>
+
+      <para>The input message will be the JSON encoding of a
+      dictionary containing the following:</para>
+
+      <variablelist>
+        <varlistentry>
+          <term>version</term>
+          <listitem>
+            <simpara>the version of the protocol; this document
+            specifies version 1</simpara>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>cluster_name</term>
+          <listitem>
+            <simpara>the cluster name</simpara>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>cluster_tags</term>
+          <listitem>
+            <simpara>the list of cluster tags</simpara>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>request</term>
+          <listitem>
+            <simpara>a dictionary containing the request data:</simpara>
+            <variablelist>
+              <varlistentry>
+                <term>type</term>
+                <listitem>
+                  <simpara>the request type; this can be either
+                  <literal>allocate</literal> or
+                  <literal>relocate</literal>; the
+                  <literal>allocate</literal> request is used when a
+                  new instance needs to be placed on the cluster,
+                  while the <literal>relocate</literal> request is
+                  used when an existing instance needs to be moved
+                  within the cluster</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>name</term>
+                <listitem>
+                  <simpara>the name of the instance; if the request is
+                  a realocation, then this name will be found in the
+                  list of instances (see below), otherwise is the
+                  <acronym>FQDN</acronym> of the new
+                  instance</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>required_nodes</term>
+                <listitem>
+                  <simpara>how many nodes should the algorithm return;
+                  while this information can be deduced from the
+                  instace's disk template, it's better if this
+                  computation is left to Ganeti as then allocator
+                  scripts are less sensitive to changes to the disk
+                  templates</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>disk_space_total</term>
+                <listitem>
+                  <simpara>the total disk space that will be used by
+                  this instance on the (new) nodes; again, this
+                  information can be computed from the list of
+                  instance disks and its template type, but Ganeti is
+                  better suited to compute it</simpara>
+                </listitem>
+              </varlistentry>
+            </variablelist>
+            <simpara>If the request is an allocation, then there are
+            extra fields in the request dictionary:</simpara>
+            <variablelist>
+              <varlistentry>
+                <term>disks</term>
+                <listitem>
+                  <simpara>list of dictionaries holding the disk
+                  definitions for this instance (in the order they are
+                  exported to the hypervisor):</simpara>
+                  <variablelist>
+                    <varlistentry>
+                      <term>mode</term>
+                      <listitem>
+                        <simpara>either <literal>w</literal> or
+                        <literal>w</literal> denoting if the disk is
+                        read-only or writable; for Ganeti 1.2, this
+                        will always be <literal>w</literal</simpara>
+                      </listitem>
+                    </varlistentry>
+                    <varlistentry>
+                      <term>size</term>
+                      <listitem>
+                        <simpara>the size of this disk in mebibyte</simpara>
+                      </listitem>
+                    </varlistentry>
+                  </variablelist>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>nics</term>
+                <listitem>
+                  <simpara>a list of dictionaries holding the network
+                  interfaces for this instance, containing:</simpara>
+                  <variablelist>
+                    <varlistentry>
+                      <term>ip</term>
+                      <listitem>
+                        <simpara>the IP address that Ganeti know for
+                        this instance, or null</simpara>
+                      </listitem>
+                    </varlistentry>
+                    <varlistentry>
+                      <term>mac</term>
+                      <listitem>
+                        <simpara>the MAC address for this interface</simpara>
+                      </listitem>
+                    </varlistentry>
+                    <varlistentry>
+                      <term>bridge</term>
+                      <listitem>
+                        <simpara>the bridge to which this interface
+                        will be connected</simpara>
+                      </listitem>
+                    </varlistentry>
+                  </variablelist>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>vcpus</term>
+                <listitem>
+                  <simpara>the number of VCPUs for the instance</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>disk_template</term>
+                <listitem>
+                  <simpara>the disk template for the instance</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>memory</term>
+                <listitem>
+                  <simpara>the memory size for the instance</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>os</term>
+                <listitem>
+                  <simpara>the OS type for the instance</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>tags</term>
+                <listitem>
+                  <simpara>the list of the instance's tags</simpara>
+                </listitem>
+              </varlistentry>
+            </variablelist>
+            <simpara>If the request is of type relocate, then there is
+            one more entry in the request dictionary, named
+            <varname>relocate_from</varname>, and it contains a list
+            of nodes to move the instance away from; note that with
+            Ganeti 1.2, this list will always contain a single node,
+            the current secondary of the instance.</simpara>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>instances</term>
+          <listitem>
+            <simpara>a dictionary with the data for the current
+            existing instance on the cluster, indexed by instance
+            name; the contents are similar to the instance definitions
+            for the allocate mode, with the addition of:</simpara>
+            <variablelist>
+              <varlistentry>
+                <term>should_run</term>
+                <listitem>
+                  <simpara>if this instance is set to run (but not the
+                  actual status of the instance)</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>nodes</term>
+                <listitem>
+                  <simpara>list of nodes on which this instance is
+                  placed; the primary node of the instance is always
+                  the first one</simpara>
+                </listitem>
+              </varlistentry>
+            </variablelist>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>nodes</term>
+          <listitem>
+            <simpara>dictionary with the data for the nodes in the
+            cluster, indexed by the node name; the dict
+            contains:</simpara>
+            <variablelist>
+              <varlistentry>
+                <term>total_disk</term>
+                <listitem>
+                  <simpara>the total disk size of this node
+                  (mebibytes)</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>free_disk</term>
+                <listitem>
+                  <simpara>the free disk space on the node</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>total_memory</term>
+                <listitem>
+                  <simpara>the total memory size</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>free_memory</term>
+                <listitem>
+                  <simpara>free memory on the node; note that
+                  currently this does not take into account the
+                  instances which are down on the node</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>primary_ip</term>
+                <listitem>
+                  <simpara>the primary IP address of the
+                  node</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>secondary_ip</term>
+                <listitem>
+                  <simpara>the secondary IP address of the node (the
+                  one used for the DRBD replication); note that this
+                  can be the same as the primary one</simpara>
+                </listitem>
+              </varlistentry>
+              <varlistentry>
+                <term>tags</term>
+                <listitem>
+                  <simpara>list with the tags of the node</simpara>
+                </listitem>
+              </varlistentry>
+            </variablelist>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+
+    </sect2>
+
+    <sect2>
+      <title>Respone message</title>
+
+      <para>The response message is much more simple than the input
+      one. It is also a dict having three keys:</para>
+      <variablelist>
+        <varlistentry>
+          <term>success</term>
+          <listitem>
+            <simpara>a boolean value denoting if the allocation was
+            successfull or not</simpara>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>info</term>
+          <listitem>
+            <simpara>a string with information from the scripts; if
+            the allocation fails, this will be shown to the
+            user</simpara>
+          </listitem>
+        </varlistentry>
+        <varlistentry>
+          <term>nodes</term>
+          <listitem>
+            <simpara>the list of nodes computed by the algorithm; even
+            if the algorithm failed (i.e. success is false), this must
+            be returned as an empty list; also note that the length of
+            this list must equal the
+            <varname>requested_nodes</varname> entry in the input
+            message, otherwise Ganeti will consider the result as
+            failed</simpara>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+    </sect2>
+  </sect1>
+
+  <sect1>
+    <title>Examples</title>
+    <sect2>
+      <title>Input messages to scripts</title>
+      <simpara>Input message, new instance allocation</simpara>
+      <screen>
+{
+  "cluster_tags": [],
+  "request": {
+    "required_nodes": 2,
+    "name": "instance3.example.com",
+    "tags": [
+      "type:test",
+      "owner:foo"
+    ],
+    "type": "allocate",
+    "disks": [
+      {
+        "mode": "w",
+        "size": 1024
+      },
+      {
+        "mode": "w",
+        "size": 2048
+      }
+    ],
+    "nics": [
+      {
+        "ip": null,
+        "mac": "00:11:22:33:44:55",
+        "bridge": null
+      }
+    ],
+    "vcpus": 1,
+    "disk_template": "drbd",
+    "memory": 2048,
+    "disk_space_total": 3328,
+    "os": "etch-image"
+  },
+  "cluster_name": "cluster1.example.com",
+  "instances": {
+    "instance1.example.com": {
+      "tags": [],
+      "should_run": false,
+      "disks": [
+        {
+          "mode": "w",
+          "size": 64
+        },
+        {
+          "mode": "w",
+          "size": 512
+        }
+      ],
+      "nics": [
+        {
+          "ip": null,
+          "mac": "aa:00:00:00:60:bf",
+          "bridge": "xen-br0"
+        }
+      ],
+      "vcpus": 1,
+      "disk_template": "plain",
+      "memory": 128,
+      "nodes": [
+        "nodee1.com"
+      ],
+      "os": "etch-image"
+    },
+    "instance2.example.com": {
+      "tags": [],
+      "should_run": false,
+      "disks": [
+        {
+          "mode": "w",
+          "size": 512
+        },
+        {
+          "mode": "w",
+          "size": 256
+        }
+      ],
+      "nics": [
+        {
+          "ip": null,
+          "mac": "aa:00:00:55:f8:38",
+          "bridge": "xen-br0"
+        }
+      ],
+      "vcpus": 1,
+      "disk_template": "drbd",
+      "memory": 512,
+      "nodes": [
+        "node2.example.com",
+        "node3.example.com"
+      ],
+      "os": "etch-image"
+    }
+  },
+  "version": 1,
+  "nodes": {
+    "node1.example.com": {
+      "total_disk": 858276,
+      "primary_ip": "192.168.1.1",
+      "secondary_ip": "192.168.2.1",
+      "tags": [],
+      "free_memory": 3505,
+      "free_disk": 856740,
+      "total_memory": 4095
+    },
+    "node2.example.com": {
+      "total_disk": 858240,
+      "primary_ip": "192.168.1.3",
+      "secondary_ip": "192.168.2.3",
+      "tags": ["test"],
+      "free_memory": 3505,
+      "free_disk": 848320,
+      "total_memory": 4095
+    },
+    "node3.example.com.com": {
+      "total_disk": 572184,
+      "primary_ip": "192.168.1.3",
+      "secondary_ip": "192.168.2.3",
+      "tags": [],
+      "free_memory": 3505,
+      "free_disk": 570648,
+      "total_memory": 4095
+    }
+  }
+}
+</screen>
+      <simpara>Input message, reallocation. Since only the request
+      entry in the input message is changed, the following shows only
+      this entry:</simpara>
+      <screen>
+  "request": {
+    "relocate_from": [
+      "node3.example.com"
+    ],
+    "required_nodes": 1,
+    "type": "relocate",
+    "name": "instance2.example.com",
+    "disk_space_total": 832
+  },
+</screen>
+
+    </sect2>
+    <sect2>
+      <title>Response messages</title>
+      <simpara>Successful response message:</simpara>
+      <screen>
+{
+  "info": "Allocation successful",
+  "nodes": [
+    "node2.example.com",
+    "node1.example.com"
+  ],
+  "success": true
+}
+</screen>
+      <simpara>Failed response message:</simpara>
+      <screen>
+{
+  "info": "Can't find a suitable node for position 2 (already selected: node2.example.com)",
+  "nodes": [],
+  "success": false
+}
+</screen>
+    </sect2>
+
+    <sect2>
+      <title>Command line messages</title>
+      <screen>
+# gnt-instance add -t plain -m 2g --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance3
+Selected nodes for the instance: node1.example.com
+* creating instance disks...
+[...]
+
+# gnt-instance add -t plain -m 3400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance4
+Failure: prerequisites not met for this operation:
+Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 1 (already selected: )
+
+# gnt-instance add -t drbd -m 1400m --os-size 1g --swap-size 512m --iallocator dumb-allocator -o etch-image instance5
+Failure: prerequisites not met for this operation:
+Can't compute nodes using iallocator 'dumb-allocator': Can't find a suitable node for position 2 (already selected: node1.example.com)
+
+</screen>
+    </sect2>
+  </sect1>
+
+  </article>