Commit d2ac5526 authored by Iustin Pop's avatar Iustin Pop
Browse files

Documentation updates

This patch adds a man page for hscan and updates the README and other
man pages with the latest changes.
parent 0ee8fd76
......@@ -242,7 +242,7 @@ should be run::
gnt-node list -oname,mtotal,mnode,mfree,dtotal,dfree \
--separator '|' --no-headers > nodes
gnt-instance list -oname,admin_ram,sda_size,pnode,snodes \
gnt-instance list -oname,admin_ram,sda_size,status,pnode,snodes \
--separator '|' --no-head > instances
These two files should be saved under the names of 'nodes' and 'instances'.
......
.TH HBAL 1 2009-03-14 htools "Ganeti H-tools"
.TH HBAL 1 2009-03-22 htools "Ganeti H-tools"
.SH NAME
hbal \- Cluster balancer for Ganeti
......@@ -7,10 +7,11 @@ hbal \- Cluster balancer for Ganeti
.B "[-C]"
.B "[-p]"
.B "[-o]"
.B "-l"
.BI "[ -m " cluster "]"
.BI "[-l" limit "]"
.BI "[-O" name... "]"
.BI "[-m " cluster "]"
.BI "[-n " nodes-file " ]"
.BI "[ -i " instances-file "]"
.BI "[-i " instances-file "]"
.B hbal
.B --version
......@@ -61,6 +62,8 @@ components:
- coefficient of variance of the percent of reserved memory
- coefficient of variance of the percent of free disk
- percentage of nodes failing N+1 check
- percentage of instances living (either as primary or secondary) on
offline nodes
The free memory and free disk values help ensure that all nodes are
somewhat balanced in their resource usage. The reserved memory helps
......@@ -69,11 +72,12 @@ instances, and that no node keeps too much memory reserved for
N+1. And finally, the N+1 percentage helps guide the algorithm towards
eliminating N+1 failures, if possible.
Except for the N+1 failures, we use the coefficient of variance since
this brings the values into the same unit so to speak, and with a
restrict domain of values (between zero and one). The percentage of
N+1 failures, while also in this numeric range, doesn't actually has
the same meaning, but it has shown to work well.
Except for the N+1 failures and offline instances percentage, we use
the coefficient of variance since this brings the values into the same
unit so to speak, and with a restrict domain of values (between zero
and one). The percentage of N+1 failures, while also in this numeric
range, doesn't actually has the same meaning, but it has shown to work
well.
The other alternative, using for N+1 checks the coefficient of
variance of (N+1 fail=1, N+1 pass=0) across nodes could hint the
......@@ -82,6 +86,12 @@ already. Since this (making N+1 failures) is not allowed by other
rules of the algorithm, so the N+1 checks would simply not work
anymore in this case.
The offline instances percentage (meaning the percentage of instances
living on offline nodes) will cause the algorithm to actively move
instances away from offline nodes. This, coupled with the restriction
on placement given by offline nodes, will cause evacuation of such
nodes.
On a perfectly balanced cluster (all nodes the same size, all
instances the same size and spread across the nodes equally), all
values would be zero. This doesn't happen too often in practice :)
......@@ -106,21 +116,54 @@ Prints the before and after node status, in a format designed to allow
the user to understand the node's most important parameters.
The node list will contain these informations:
- a character denoting the status of the node, with '-' meaning an
offline node, '*' meaning N+1 failure and blank meaning a good
node
- the node name
- the total node memory
- the memory used by the node itself
- the free node memory
- the reserved node memory, which is the amount of free memory
needed for N+1 compliance
- total disk
- free disk
- number of primary instances
- number of secondary instances
- percent of free memory
- percent of free disk
.RS
.TP
.B F
a character denoting the status of the node, with '-' meaning an
offline node, '*' meaning N+1 failure and blank meaning a good node
.TP
.B Name
the node name
.TP
.B t_mem
the total node memory
.TP
.B n_mem
the memory used by the node itself
.TP
.B i_mem
the memory used by instances
.TP
.B x_mem
amount memory which seems to be in use but cannot be determined why or
by which instance; usually this means that the hypervisor has some
overhead or that there are other reporting errors
.TP
.B f_mem
the free node memory
.TP
.B r_mem
the reserved node memory, which is the amount of free memory needed
for N+1 compliance
.TP
.B t_dsk
total disk
.TP
.B f_dsk
free disk
.TP
.B pri
number of primary instances
.TP
.B sec
number of secondary instances
.TP
.B p_fmem
percent of free memory
.TP
.B p_fdsk
percent of free disk
.RE
.TP
.B -o, --oneline
......@@ -134,6 +177,22 @@ The line will contain four fields:
- final cluster score
- improvement in the cluster score
.TP
.BI "-O " name
This option (which can be given multiple times) will mark nodes as
being \fIoffline\fR. This means a couple of things:
.RS
.TP
-
instances won't be placed on these nodes, not even temporarily;
e.g. the \fIreplace primary\fR move is not available if the secondary
node is offline, since this move requires a failover.
.TP
-
these nodes will not be included in the score calculation (except for
the percentage of instances on offline nodes)
.RE
.TP
.BI "-n" nodefile ", --nodes=" nodefile
The name of the file holding node information (if not collecting via
......@@ -188,6 +247,9 @@ input file).
.SH EXAMPLE
Note that this example are not for the latest version (they don't have
full node data).
.SS Default output
With the default options, the program shows each individual step and
......@@ -362,4 +424,5 @@ changed in a way that the program will output a different solution
list (but hopefully will end in the same state).
.SH SEE ALSO
hn1(1), ganeti(7), gnt-instance(8), gnt-node(8)
.BR hn1 "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
.BR gnt-node "(8)"
.TH HN1 1 2009-03-14 htools "Ganeti H-tools"
.TH HN1 1 2009-03-22 htools "Ganeti H-tools"
.SH NAME
hn1 \- N+1 fixer for Ganeti
......@@ -92,21 +92,54 @@ Prints the before and after node status, in a format designed to allow
the user to understand the node's most important parameters.
The node list will contain these informations:
- a character denoting the status of the node, with '-' meaning an
offline node, '*' meaning N+1 failure and blank meaning a good
node
- the node name
- the total node memory
- the memory used by the node itself
- the free node memory
- the reserved node memory, which is the amount of free memory
needed for N+1 compliance
- total disk
- free disk
- number of primary instances
- number of secondary instances
- percent of free memory
- percent of free disk
.RS
.TP
.B F
a character denoting the status of the node, with '-' meaning an
offline node, '*' meaning N+1 failure and blank meaning a good node
.TP
.B Name
the node name
.TP
.B t_mem
the total node memory
.TP
.B n_mem
the memory used by the node itself
.TP
.B i_mem
the memory used by instances
.TP
.B x_mem
amount memory which seems to be in use but cannot be determined why or
by which instance; usually this means that the hypervisor has some
overhead or that there are other reporting errors
.TP
.B f_mem
the free node memory
.TP
.B r_mem
the reserved node memory, which is the amount of free memory needed
for N+1 compliance
.TP
.B t_dsk
total disk
.TP
.B f_dsk
free disk
.TP
.B pri
number of primary instances
.TP
.B sec
number of secondary instances
.TP
.B p_fmem
percent of free memory
.TP
.B p_fdsk
percent of free disk
.RE
.TP
.BI "-n" nodefile ", --nodes=" nodefile
......@@ -171,4 +204,5 @@ feed moves directly into Ganeti (either via RAPI or via a gnt-debug
input file).
.SH SEE ALSO
hbal(1), ganeti(7), gnt-instance(8), gnt-node(8)
.BR hbal "(1), " hscan "(1), " ganeti "(7), " gnt-instance "(8), "
.BR gnt-node "(8)"
.TH HSCAN 1 2009-03-22 htools "Ganeti H-tools"
.SH NAME
hscan \- Scan clusters via RAPI and save node/instance data
.SH SYNOPSIS
.B hscan
.B "[-p]"
.B "[--no-headers]"
.BI "[-d " path "]"
.I cluster...
.B hscan
.B --version
.SH DESCRIPTION
hscan is a tool for scanning clusters via RAPI and saving their data
in the input format used by
.BR hbal "(1) and " hn1 "(1)."
It will also show a one-line score for each cluster scanned or, if
desired, the cluster state as show by the \fB-p\fR option to the other
tools.
For each cluster, two files named \fIcluster\fB.instances\fR and
\fIcluster\fB.nodes\fR will be generated holding the instance and node
data. These files can then be used in \fBhbal\fR(1) or \fBhn1\fR(1)
via the \fB-i\fR and \fB-n\fR options.
.SH OPTIONS
The options that can be passed to the program are as follows:
.TP
.B -p, --print-nodes
Prints the node status for each cluster after the cluster's one-line
status display, in a format designed to allow the user to understand
the node's most important parameters. For details, see the man page
for \fBhbal\fR(1).
.TP
.BI "-d " path
Save the node and instance data for each cluster under \fIpath\fR,
instead of the current directory.
.TP
.B -V, --version
Just show the program version and exit.
.SH EXIT STATUS
The exist status of the command will be zero, unless for some reason
loading the input data failed fatally (e.g. wrong node or instance
data).
.SH BUGS
The program does not check its input data for consistency, and aborts
with cryptic errors messages in this case.
.SH SEE ALSO
.BR hbal "(1), " hn1 "(1), " ganeti "(7), " gnt-instance "(8), " gnt-node "(8)"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment