harep.rst 2.9 KB
Newer Older
Michele Tartara's avatar
Michele Tartara committed
1
HAREP(1) Ganeti | Version @GANETI_VERSION@
2
==========================================
Michele Tartara's avatar
Michele Tartara committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

NAME
----

harep - Ganeti auto-repair tool

SYNOPSIS
--------

**harep** [ [**-L** | **\--luxi** ] = *socket* ] [ --job-delay = *seconds* ]

**harep** \--version

DESCRIPTION
-----------

Michele Tartara's avatar
Michele Tartara committed
19
Harep is the Ganeti auto-repair tool. It is able to detect that an instance is
Michele Tartara's avatar
Michele Tartara committed
20
21
22
broken and to generate a sequence of jobs that will fix it, in accordance to the
policies set by the administrator.

Michele Tartara's avatar
Michele Tartara committed
23
Harep is able to recognize what state an instance is in (healthy, suspended,
Petr Pudlak's avatar
Petr Pudlak committed
24
25
needs repair, repair disallowed, pending repair, repair failed)
and to lead it through a sequence of steps that will bring the instance
Michele Tartara's avatar
Michele Tartara committed
26
back to the healthy state. Therefore, harep is mainly meant to be run regularly
Petr Pudlak's avatar
Petr Pudlak committed
27
and frequently using a cron job, so that it can actually follow the instance
Michele Tartara's avatar
Michele Tartara committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
along all the process. At every run, harep will update the tags it adds to
instances that describe its repair status, and will submit jobs that actually
perform the required repair operations.

By default, harep only reports on the health status of instances, but doesn't
perform any action, as they might be potentially dangerous. Therefore, harep
will only touch instances that it has been explicitly authorized to work on.

The tags enabling harep, can be associated to single instances, or to a
nodegroup or to the whole cluster, therefore affecting all the instances they
contain. The possible tags share the common structure::

 ganeti:watcher:autorepair:<type>

where ``<type>`` can have the following values:

* ``fix-storage``: allow disk replacement or fix the backend without affecting the instance
  itself (broken DRBD secondary)
* ``migrate``: allow instance migration
* ``failover``: allow instance reboot on the secondary
* ``reinstall``: allow disks to be recreated and the instance to be reinstalled

Each element in the list of tags, includes all the authorizations of the
previous one, with ``fix-storage`` being the least powerful and ``reinstall``
being the most powerful.

In case multiple autorepair tags act on the same instance, only one can actually
be active. The conflict is solved according to the following rules:

#. if multiple tags are in the same object, the least destructive takes
   precedence.

#. if the tags are across objects, the nearest tag wins.

Example:
A cluster has instances I1 and I2, where I1 has the ``failover`` tag, and
the cluster has both ``fix-storage`` and ``reinstall``.
The I1 instance will be allowed to ``failover``, the I2 instance only to
``fix-storage``.


Michele Tartara's avatar
Michele Tartara committed
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
OPTIONS
-------

The options that can be passed to the program are as follows:

-L *socket*, \--luxi=*socket*
  collect data via Luxi, optionally using the given *socket* path.

\--job-delay=*seconds*
  insert this much delay before the execution of repair jobs to allow the tool
  to continue processing instances.

.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: