design-oob.rst 16 KB
Newer Older
1 2 3 4 5 6
Ganeti Node OOB Management Framework
====================================

Objective
---------

7 8
Extend Ganeti with Out of Band (:term:`OOB`) Cluster Node Management
Capabilities.
9 10 11 12

Background
----------

13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Ganeti currently has no support for Out of Band management of the nodes
in a cluster. It relies on the OS running on the nodes and has therefore
limited possibilities when the OS is not responding. The command
``gnt-node powercycle`` can be issued to attempt a reboot of a node that
crashed but there are no means to power a node off and power it back
on. Supporting this is very handy in the following situations:

  * **Emergency Power Off**: During emergencies, time is critical and
    manual tasks just add latency which can be avoided through
    automation. If a server room overheats, halting the OS on the nodes
    is not enough. The nodes need to be powered off cleanly to prevent
    damage to equipment.
  * **Repairs**: In most cases, repairing a node means that the node has
    to be powered off.
  * **Crashes**: Software bugs may crash a node. Having an OS
    independent way to power-cycle a node helps to recover the node
    without human intervention.
30 31 32 33

Overview
--------

34 35 36 37 38 39 40
Ganeti will be extended with OOB capabilities through adding a new
**Cluster Parameter** (``--oob-program``), a new **Node Property**
(``--oob-program``), a new **Node State (powered)** and support in
``gnt-node`` for invoking an **External Helper Command** which executes
the actual OOB command (``gnt-node <command> nodename ...``). The
supported commands are: ``power on``, ``power off``, ``power cycle``,
``power status`` and ``health``.
41 42

.. note::
43 44 45 46 47
  The new **Node State (powered)** is a **State of Record**
  (:term:`SoR`), not a **State of World** (:term:`SoW`).  The maximum
  execution time of the **External Helper Command** will be limited to
  60s to prevent the cluster from getting locked for an undefined amount
  of time.
48 49 50 51 52 53 54 55 56 57 58 59

Detailed Design
---------------

New ``gnt-cluster`` Parameter
+++++++++++++++++++++++++++++

| Program: ``gnt-cluster``
| Command: ``modify|init``
| Parameters: ``--oob-program``
| Options: ``--oob-program``: executable OOB program (absolute path)

60 61 62 63 64 65 66 67 68 69 70 71
New ``gnt-cluster epo`` Command
+++++++++++++++++++++++++++++++

| Program: ``gnt-cluster``
| Command: ``epo``
| Parameter: ``--on`` ``--force`` ``--groups`` ``--all``
| Options: ``--on``: By default epo turns off, with ``--on`` it tries to get the
|                    cluster back online
|          ``--force``: To force the operation without asking for confirmation
|          ``--groups``: To operate on groups instead of nodes
|          ``--all``: To operate on the whole cluster

72 73 74
This is a convenience command to allow easy emergency power off of a
whole cluster or part of it. It takes care of all steps needed to get
the cluster into a sane state to turn off the nodes.
75

76 77
With ``--on`` it does the reverse and tries to bring the rest of the
cluster back to life.
78 79

.. note::
80 81 82 83 84 85
  The master node is not able to shut itself cleanly down. Therefore,
  this command will not do all the work on single node clusters. On
  multi node clusters the command tries to find another master or if
  that is not possible prepares everything to the point where the user
  has to shutdown the master node itself alone this applies also to the
  single node cluster configuration.
86

87 88 89 90 91 92 93 94 95
New ``gnt-node`` Property
+++++++++++++++++++++++++

| Program: ``gnt-node``
| Command: ``modify|add``
| Parameters: ``--oob-program``
| Options: ``--oob-program``: executable OOB program (absolute path)

.. note::
96 97 98 99
  If ``--oob-program`` is set to ``!`` then the node has no OOB
  capabilities.  Otherwise, we will inherit the node group respectively
  the cluster wide value. I.e. the nodes have to opt out from OOB
  capabilities.
100 101 102 103 104 105 106 107 108 109

Addition to ``gnt-cluster verify``
++++++++++++++++++++++++++++++++++

| Program: ``gnt-cluster``
| Command: ``verify``
| Parameter: None
| Option: None
| Additional Checks:

110 111 112 113 114 115
  1. existence and execution flag of OOB program on all Master
     Candidates if the cluster parameter ``--oob-program`` is set or at
     least one node has the property ``--oob-program`` set. The OOB
     helper is just invoked on the master
  2. check if node state powered matches actual power state of the
     machine for those nodes where ``--oob-program`` is set
116 117 118 119 120 121 122

New Node State
++++++++++++++

Ganeti supports the following two boolean states related to the nodes:

**drained**
123 124
  The cluster still communicates with drained nodes but excludes them
  from allocation operations
125 126

**offline**
127 128
  if offline, the cluster does not communicate with offline nodes;
  useful for nodes that are not reachable in order to avoid delays
129 130 131 132

And will extend this list with the following boolean state:

**powered**
133 134 135
  if not powered, the cluster does not communicate with not powered
  nodes if the node property ``--oob-program`` is not set, the state
  powered is not displayed
136 137 138 139

Additionally modify the meaning of the offline state as follows:

**offline**
140 141 142 143
  if offline, the cluster does not communicate with offline nodes
  (**with the exception of OOB commands for nodes where**
  ``--oob-program`` **is set**); useful for nodes that are not reachable
  in order to avoid delays
144 145 146 147 148 149 150 151

The corresponding command extensions are:

| Program: ``gnt-node``
| Command: ``info``
| Parameter:  [ ``nodename`` ... ]
| Option: None

152 153
Additional Output (:term:`SoR`, ommited if node property
``--oob-program`` is not set):
154 155 156 157 158 159
powered: ``[True|False]``

| Program: ``gnt-node``
| Command: ``modify``
| Parameter: nodename
| Option: [ ``--powered=yes|no`` ]
160
| Reasoning: sometimes you will need to sync the :term:`SoR` with the :term:`SoW` manually
161 162 163 164 165 166 167 168 169 170 171 172
| Caveat: ``--powered`` can only be modified if ``--oob-program`` is set for
|         the node in question

New ``gnt-node`` commands: ``power [on|off|cycle|status]``
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

| Program: ``gnt-node``
| Command: ``power [on|off|cycle|status]``
| Parameters: [ ``nodename`` ... ]
| Options: None
| Caveats:

173 174 175 176
  * If no nodenames are passed to ``power [on|off|cycle]``, the user
    will be prompted with ``"Do you really want to power [on|off|cycle]
    the following nodes: <display list of OOB capable nodes in the
    cluster)? (y/n)"``
177
  * For ``power-status``, nodename is optional, if omitted, we list the
178
    power-status of all OOB capable nodes in the cluster (:term:`SoW`)
179 180 181 182 183 184
  * User should be warned and needs to confirm with yes if s/he tries to
    ``power [off|cycle]`` a node with running instances.

Error Handling
^^^^^^^^^^^^^^

185 186 187 188 189 190 191 192
+-----------------------------+----------------------------------------------+
| Exception                   | Error Message                                |
+=============================+==============================================+
| OOB program return code != 0| OOB program execution failed ($ERROR_MSG)    |
+-----------------------------+----------------------------------------------+
| OOB program execution time  | OOB program execution timeout exceeded, OOB  |
| exceeds 60s                 | program execution aborted                    |
+-----------------------------+----------------------------------------------+
193 194 195 196

Node State Changes
^^^^^^^^^^^^^^^^^^

197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
+----------------+---------------+----------------+--------------------------+
| State before   |Command        | State after    | Comment                  |
| execution      |               | execution      |                          |
+================+===============+================+==========================+
| powered: False |``power off``  | powered: False | FYI: IPMI will complain  |
|                |               |                | if you try to power off  |
|                |               |                | a machine that is already|
|                |               |                | powered off              |
+----------------+---------------+----------------+--------------------------+
| powered: False |``power cycle``| powered: False | FYI: IPMI will complain  |
|                |               |                | if you try to cycle a    |
|                |               |                | machine that is already  |
|                |               |                | powered off              |
+----------------+---------------+----------------+--------------------------+
| powered: False |``power on``   | powered: True  |                          |
+----------------+---------------+----------------+--------------------------+
| powered: True  |``power off``  | powered: False |                          |
+----------------+---------------+----------------+--------------------------+
| powered: True  |``power cycle``| powered: True  |                          |
+----------------+---------------+----------------+--------------------------+
| powered: True  |``power on``   | powered: True  | FYI: IPMI will complain  |
|                |               |                | if you try to power on   |
|                |               |                | a machine that is already|
|                |               |                | powered on               |
+----------------+---------------+----------------+--------------------------+
222 223 224 225 226

.. note::

  * If the command fails, the Node State remains unchanged.
  * We will not prevent the user from trying to power off a node that is
227 228 229 230 231 232 233 234
    already powered off since the powered state represents the
    :term:`SoR` only and not the :term:`SoW`. This can however create
    problems when the cluster administrator wants to bring the
    :term:`SoR` in sync with the :term:SoW` without actually having to
    mess with the node(s). For this case, we allow direct modification
    of the powered state through the gnt-node modify
    ``--powered=[yes|no]`` command as long as the node has OOB
    capabilities (i.e. ``--oob-program`` is set).
235 236
  * All node power state changes will be logged

237 238
Node Power Status Listing (:term:`SoW`)
+++++++++++++++++++++++++++++++++++++++
239 240 241 242 243

| Program: ``gnt-node``
| Command: ``power-status``
| Parameters: [ ``nodename`` ... ]

244
Example output (represents :term:`SoW`)::
245 246 247 248 249 250 251 252 253 254

  gnt-node oob power-status
  Node                      Power Status
  node1.example.com         on
  node2.example.com         off
  node3.example.com         on
  node4.example.com         unknown

.. note::

255 256 257 258 259 260 261 262
  * We use ``unknown`` in case the Helper Program could not determine
    the power state.
  * If no nodenames are provided, we will list the power state of all
    nodes which are not opted out from OOB management.
  * Only nodes which are not opted out from OOB management will be
    listed.  Invoking the command on a node that does not meet this
    condition will result in an error message "Node X does not support
    OOB commands".
263

264 265
Node Power Status Listing (:term:`SoR`)
+++++++++++++++++++++++++++++++++++++++
266 267 268 269 270 271

| Program: ``gnt-node``
| Command: ``info``
| Parameter:  [ ``nodename`` ... ]
| Option: None

272
Example output (represents :term:`SoR`)::
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292

  gnt-node info node1.example.com
  Node name: node1.example.com
    primary ip: 192.168.1.1
    secondary ip: 192.168.2.1
    master candidate: True
    drained: False
    offline: False
    powered: True
    primary for instances:
      - inst1.example.com
      - inst2.example.com
      - inst3.example.com
    secondary for instances:
      - inst4.example.com
      - inst5.example.com
      - inst6.example.com
      - inst7.example.com

.. note::
293 294
  Only nodes which are not opted out from OOB management will report the
  powered state.
295 296 297 298 299 300 301 302 303 304 305 306

New ``gnt-node`` oob subcommand: ``health``
+++++++++++++++++++++++++++++++++++++++++++

| Program: ``gnt-node``
| Command: ``health``
| Parameters: [ ``nodename`` ... ]
| Options: None
| Example: ``/usr/bin/oob health node5.example.com``

Caveats:

307 308 309 310 311 312
  * If no nodename(s) are provided, we will report the health of all
    nodes in the cluster which have ``--oob-program`` set.
  * Only nodes which are not opted out from OOB management will report
    their health. Invoking the command on a node that does not meet this
    condition will result in an error message "Node does not support OOB
    commands".
313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328

For error handling see `Error Handling`_

OOB Program (Helper Program) Parameters, Return Codes and Data Format
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

| Program: executable OOB program (absolute path)
| Parameters: command nodename
| Command: [power-{on|off|cycle|status}|health]
| Options: None
| Example: ``/usr/bin/oob power-on node1.example.com``
| Caveat: maximum runtime is limited to 60s

Return Codes
^^^^^^^^^^^^

329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
+-------------+-------------------------+
| Return code | Meaning                 |
+=============+=========================+
| 0           | Command succeeded       |
+-------------+-------------------------+
| 1           | Command failed          |
+-------------+-------------------------+
| others      | Unsupported/undefined   |
+-------------+-------------------------+

Error messages are passed from the helper program to Ganeti through
:manpage:`stderr(3)` (return code == 1).  On :manpage:`stdout(3)`, the
helper program will send data back to Ganeti (return code == 0). The
format of the data is JSON.

+-----------------+------------------------------+
| Command         | Expected output              |
+=================+==============================+
| ``power-on``    | None                         |
+-----------------+------------------------------+
| ``power-off``   | None                         |
+-----------------+------------------------------+
| ``power-cycle`` | None                         |
+-----------------+------------------------------+
| ``power-status``| ``{ "powered": true|false }``|
+-----------------+------------------------------+
| ``health``      | ::                           |
|                 |                              |
|                 |   [[item, status],           |
|                 |    [item, status],           |
|                 |    ...]                      |
+-----------------+------------------------------+
361 362 363 364 365 366

Data Format
^^^^^^^^^^^

For the health output, the fields are:

367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383
+--------+------------------------------------------------------------------+
| Field  | Meaning                                                          |
+========+==================================================================+
| item   | String identifier of the item we are querying the health of,     |
|        | examples:                                                        |
|        |                                                                  |
|        |   * Ambient Temp                                                 |
|        |   * PS Redundancy                                                |
|        |   * FAN 1 RPM                                                    |
+--------+------------------------------------------------------------------+
| status | String; Can take one of the following four values:               |
|        |                                                                  |
|        |   * OK                                                           |
|        |   * WARNING                                                      |
|        |   * CRITICAL                                                     |
|        |   * UNKNOWN                                                      |
+--------+------------------------------------------------------------------+
384 385 386

.. note::

387 388 389 390 391 392 393 394 395
  * The item output list is defined by the Helper Program. It is up to
    the author of the Helper Program to decide which items should be
    monitored and what each corresponding return status is.
  * Ganeti will currently not take any actions based on the item
    status. It will however create log entries for items with status
    WARNING or CRITICAL for each run of the ``gnt-node oob health
    nodename`` command. Automatic actions (regular monitoring of the
    item status) is considered a new service and will be treated in a
    separate design document.
396 397 398 399

Logging
-------

400 401 402 403
The ``gnt-node power-[on|off]`` (power state changes) commands will
create log entries following current Ganeti logging practices. In
addition, health items with status WARNING or CRITICAL will be logged
for each run of ``gnt-node health``.
404 405 406 407 408 409

.. vim: set textwidth=72 :
.. Local Variables:
.. mode: rst
.. fill-column: 72
.. End: