diff --git a/doc/admin.rst b/doc/admin.rst index 51a879462f6b024add94384fa5a9c45124879dbe..d0ed24ba2ba9337eae18cccfbff7117fb8ba212b 100644 --- a/doc/admin.rst +++ b/doc/admin.rst @@ -1058,6 +1058,20 @@ in the manpage. .. note:: this command only stores a local flag file, and if you failover the master, it will not have effect on the new master. +Node auto-maintenance ++++++++++++++++++++++ + +If the cluster parameter ``maintain_node_health`` is enabled (see the +manpage for :command:`gnt-cluster`, the init and modify subcommands), +then the following will happen automatically: + +- the watcher will shutdown any instances running on offline nodes +- the watcher will deactivate any DRBD devices on offline nodes + +In the future, more actions are planned, so only enable this parameter +if the nodes are completely dedicated to Ganeti; otherwise it might be +possible to lose data due to auto-maintenance actions. + Removing a cluster entirely +++++++++++++++++++++++++++ diff --git a/man/ganeti-watcher.sgml b/man/ganeti-watcher.sgml index b6e94658574d7ca12629fc592cdf790839c3da25..9048baa717c8c437196f8eea0b74f47fc20d2d55 100644 --- a/man/ganeti-watcher.sgml +++ b/man/ganeti-watcher.sgml @@ -48,31 +48,68 @@ <para> The <command>&dhpackage;</command> is a periodically run script which is responsible for keeping the instances in the correct - status. + status. It has two separate functions, one for the master node + and another one that runs on every node. </para> - <para> - Its primary function is to try to keep running all instances - which are marked as <emphasis>up</emphasis> in the configuration - file, by trying to start them a limited number of times. - </para> + <refsect2> + <title>Master operations</title> - <para> - Its other function is to <quote>repair</quote> DRBD links by - reactivating the block devices of instances which have - secondaries on nodes that have been rebooted. - </para> + <para> + Its primary function is to try to keep running all instances + which are marked as <emphasis>up</emphasis> in the configuration + file, by trying to start them a limited number of times. + </para> - <para> - The watcher does synchronous queries but will submit jobs for - executing the changes. Due to locking, it could be that the jobs - execute much later than the watcher executes them. - </para> + <para> + Its other function is to <quote>repair</quote> DRBD links by + reactivating the block devices of instances which have + secondaries on nodes that have been rebooted. + </para> + + </refsect2> + + <refsect2> + + <title>Node operations</title> + + <para> + The watcher will restart any down daemons that are appropriate + for the current node. + </para> + + <para> + In addition, it will execute any scripts which exist under the + <quote>watcher</quote> directory in the ganeti hooks directory + (@SYSCONFDIR@/ganeti/hooks). This should be used for + lightweight actions, like starting any extra daemons. + </para> + + <para> + If the cluster + parameter <literal>maintain_node_health</literal> is enabled, + then the watcher will also shutdown instances and DRBD devices + if the node is declared as offline by known master candidates. + </para> + + <para> + The watcher does synchronous queries but will submit jobs for + executing the changes. Due to locking, it could be that the jobs + execute much later than the watcher executes them. + </para> + + </refsect2> + + + </refsect1> + + <refsect1> + <title>FILES</title> <para> The command has a state file located at - <filename>@LOCALSTATEDIR@/lib/ganeti/watcher.data</filename> and a log - file at + <filename>@LOCALSTATEDIR@/lib/ganeti/watcher.data</filename> + (only used on the master) and a log file at <filename>@LOCALSTATEDIR@/log/ganeti/watcher.log</filename>. Removal of either file will not affect correct operation; the removal of the state file will just cause the restart counters for the