diff --git a/hbal.1 b/hbal.1
index 352a8c7a297b8cbc4d1a30dc5362ad4b0afbdcaf..2d98c9598ed2f384bea13138faf31747cfe6f008 100644
--- a/hbal.1
+++ b/hbal.1
@@ -1,4 +1,4 @@
-.TH HBAL 1 2009-03-22 htools "Ganeti H-tools"
+.TH HBAL 1 2009-03-23 htools "Ganeti H-tools"
 .SH NAME
 hbal \- Cluster balancer for Ganeti
 
@@ -41,11 +41,23 @@ The possible move type for an instance are combinations of
 failover/migrate and replace-disks such that we change one of the
 instance nodes, and the other one remains (but possibly with changed
 role, e.g. from primary it becomes secondary). The list is:
-  - failover (f)
-  - replace secondary (r)
-  - replace primary, a composite move (f, r, f)
-  - failover and replace secondary, also composite (f, r)
-  - replace secondary and failover, also composite (r, f)
+.RS 4
+.TP 3
+\(em
+failover (f)
+.TP
+\(em
+replace secondary (r)
+.TP
+\(em
+replace primary, a composite move (f, r, f)
+.TP
+\(em
+failover and replace secondary, also composite (f, r)
+.TP
+\(em
+replace secondary and failover, also composite (r, f)
+.RE
 
 We don't do the only remaining possibility of replacing both nodes
 (r,f,r,f or the equivalent f,r,f,r) since these move needs an
@@ -58,12 +70,24 @@ give better scores but will result in more disk replacements.
 As said before, the algorithm tries to minimise the cluster score at
 each step. Currently this score is computed as a sum of the following
 components:
-  - coefficient of variance of the percent of free memory
-  - coefficient of variance of the percent of reserved memory
-  - coefficient of variance of the percent of free disk
-  - percentage of nodes failing N+1 check
-  - percentage of instances living (either as primary or secondary) on
-    offline nodes
+.RS 4
+.TP 3
+\(em
+coefficient of variance of the percent of free memory
+.TP
+\(em
+coefficient of variance of the percent of reserved memory
+.TP
+\(em
+coefficient of variance of the percent of free disk
+.TP
+\(em
+percentage of nodes failing N+1 check
+.TP
+\(em
+percentage of instances living (either as primary or secondary) on
+offline nodes
+.RE
 
 The free memory and free disk values help ensure that all nodes are
 somewhat balanced in their resource usage. The reserved memory helps
@@ -96,13 +120,29 @@ On a perfectly balanced cluster (all nodes the same size, all
 instances the same size and spread across the nodes equally), all
 values would be zero. This doesn't happen too often in practice :)
 
+.SS OFFLINE INSTANCES
+
+Since current Ganeti versions do not report the memory used by offline
+(down) instances, ignoring the run status of instances will cause
+wrong calculations. For this reason, the algorithm subtracts the
+memory size of down instances from the free node memory of their
+primary node, in effect simulating the startup of such instances.
+
 .SS OTHER POSSIBLE METRICS
 
 It would be desirable to add more metrics to the algorithm, especially
 dynamically-computed metrics, such as:
-  - CPU usage of instances, combined with VCPU versus PCPU count
-  - Disk IO usage
-  - Network IO
+.RS 4
+.TP 3
+\(em
+CPU usage of instances, combined with VCPU versus PCPU count
+.TP
+\(em
+Disk IO usage
+.TP
+\(em
+Network IO
+.RE
 
 .SH OPTIONS
 The options that can be passed to the program are as follows:
@@ -172,26 +212,40 @@ when one wants to look at multiple clusters at once and check their
 status.
 
 The line will contain four fields:
-  - initial cluster score
-  - number of steps in the solution
-  - final cluster score
-  - improvement in the cluster score
+.RS
+.RS 4
+.TP 3
+\(em
+initial cluster score
+.TP
+\(em
+number of steps in the solution
+.TP
+\(em
+final cluster score
+.TP
+\(em
+improvement in the cluster score
+.RE
+.RE
 
 .TP
 .BI "-O " name
 This option (which can be given multiple times) will mark nodes as
 being \fIoffline\fR. This means a couple of things:
 .RS
-.TP
--
+.RS 4
+.TP 3
+\(em
 instances won't be placed on these nodes, not even temporarily;
 e.g. the \fIreplace primary\fR move is not available if the secondary
 node is offline, since this move requires a failover.
 .TP
--
+\(em
 these nodes will not be included in the score calculation (except for
 the percentage of instances on offline nodes)
 .RE
+.RE
 
 .TP
 .BI "-n" nodefile ", --nodes=" nodefile
@@ -241,6 +295,9 @@ with cryptic errors messages in this case.
 
 The algorithm is not perfect.
 
+The algorithm doesn't deal with non-\fBdrbd\fR instances, and chokes
+on input data which has such instances.
+
 The output format is not easily scriptable, and the program should
 feed moves directly into Ganeti (either via RAPI or via a gnt-debug
 input file).
diff --git a/hn1.1 b/hn1.1
index 34455f5b759d0edf958acdd8c0d72c44d0abef66..648d5cdaeb5dcee29b59c37223424ef0adc1eabd 100644
--- a/hn1.1
+++ b/hn1.1
@@ -1,4 +1,4 @@
-.TH HN1 1 2009-03-22 htools "Ganeti H-tools"
+.TH HN1 1 2009-03-23 htools "Ganeti H-tools"
 .SH NAME
 hn1 \- N+1 fixer for Ganeti
 
@@ -47,38 +47,113 @@ candidate for replacement (and only these!).
 
 The program start then with \fIdepth\fR one (unless overridden via the
 \fB-d\fR option), and at each round:
-  - it tries to remove from the cluster as many instances as the
-    current depth in order to make the cluster N+1 compliant
-  - then, for each of the possible instance combinations that allow
-    this (unless the total size is reduced via the \fB-r\fR option),
-    it tries to put them back on the cluster while maintaining N+1
-    compliance
+.RS 4
+.TP 3
+\(em
+it tries to remove from the cluster as many instances as the current
+depth in order to make the cluster N+1 compliant
+
+.TP
+\(em
+then, for each of the possible instance combinations that allow this
+(unless the total size is reduced via the \fB-r\fR option), it tries
+to put them back on the cluster while maintaining N+1 compliance
+.RE
 
 It might be that at a given round, the results are:
-  - no instance combination that can be put back; this means it is not
-    possible to make the cluster N+1 compliant with this number of
-    instances being moved, so we increase the depth and go on to the
-    next round
-  - one or more successful result, in which case we take the one that
-    has as few changes as possible (by change meaning a replace-disks
-    needed)
+.RS 4
+.TP 3
+\(em
+no instance combination that can be put back; this means it is not
+possible to make the cluster N+1 compliant with this number of
+instances being moved, so we increase the depth and go on to the next
+round
+.TP
+\(em
+one or more successful result, in which case we take the one that has
+as few changes as possible (by change meaning a replace-disks needed)
+.RE
 
 The main problem with the algorithm is that, being an exhaustive
 search, the CPU time required grows very very quickly based on
 depth. On a 20-node, 80-instances cluster, depths up to 5-6 are
 quickly computed, and depth 10 could already take days.
 
-Since the algorithm is designed to prune the search space as quickly
-as possible, is by luck we find a good solution early at a given
-depth, then the other solutions which would result in a bigger delta
-(the number of changes) will not be investigated, and the program will
-finish fast. Since this is random and depends on where in the full
-solution space the good solution will be, there are two options for
-cutting down the time needed:
-  - \fB-l\fR makes any solution that has delta lower than its
-    parameter succeed instantly
-  - \fB-L\fR makes any solution with delta higher than its parameter
-    being rejected instantly (and not descend on the search tree)
+The main factors that influence the run time are:
+.RS 4
+.TP 3
+\(em
+the removal depth; for each increase with one of the depth, we grow
+the solution space by the number of nodes squared (since a new
+instance can live any two nodes as primary/secondary, therefore
+(almost) N times N); i.e., depth=1 will create a N^2 solution space,
+depth two will make this N^4, depth three will be N^6, etc.
+
+.TP
+\(em
+the removal depth again; for each increase in the depth, there will be
+more valid removal sets, and the space of solutions increases linearly
+with the number of removal sets
+.RE
+
+Therefore, the smaller the depth the faster the algorithm will be; it doesn't
+seem like this algorithm will work for clusters of 100 nodes and many many
+small instances (e.g. 256MB instances on 16GB nodes).
+
+As an optimisation, since the algorithm is designed to prune the
+search space as quickly as possible, is by luck we find a good
+solution early at a given depth, then the other solutions which would
+result in a bigger delta (the number of changes) will not be
+investigated, and the program will finish fast. Since this is random
+and depends on where in the full solution space the good solution will
+be, there are two options for cutting down the time needed:
+.RS 4
+.TP 3
+\(em
+\fB-l\fR makes any solution that has delta lower than its parameter
+succeed instantly; the default value for this parameter is zero, so
+once we find a "perfect" solution we finish early
+
+.TP
+\(em
+\fB-L\fR makes any solution with delta higher than its parameter being
+rejected instantly (and not descend on the search tree); this can
+reduce the depth of the search tree, with sometimes significant
+speedups; by default, this optimization is not used
+.RE
+
+The algorithm also has some other internal optimisations:
+.RS 4
+.TP 3
+\(em
+when choosing where to place an instance in phase two, there are
+N*(N-1) possible primary/secondary options; however, if instead of
+iterating over all p * s pairs, we first determine the set of primary
+nodes that can hold this instance (without failing N+1), we can cut
+(N-1) secondary placements for each primary node removed; and since
+this applies at every iteration of phase 2 it linearly decreases the
+solution space, and on full clusters, this can mean a four-five times
+reductions of solution space
+
+.TP
+\(em
+since the number of solutions is very high even for smaller depths (on
+the test data, depth=4 results in 1.8M solutions) we can't compare
+them at the end, so at each iteration in phase 2 we only promote the
+best solution out of our own set of solutions
+
+.TP
+\(em
+since the placement of instances can only increase the delta of the
+solution (placing a new instance will add zero or more replace-disks
+steps), it means the delta will only increase while recursing during
+phase 2; therefore, if we know at one point that we have a current
+delta that is equal or higher to the delta of the best solution so
+far, we can abort the recursion; this cuts a tremendous number of
+branches; further promotion of the best solution from one removal set
+to another can cut entire removal sets after a few recursions
+
+.RE
 
 .SH OPTIONS
 The options that can be passed to the program are as follows:
@@ -193,6 +268,9 @@ the algorithm fatally failed (e.g. wrong node or instance data).
 The program does not check its input data for consistency, and aborts
 with cryptic errors messages in this case.
 
+The algorithm doesn't deal with non-\fBdrbd\fR instances, and chokes
+on input data which has such instances.
+
 The algorithm doesn't know when it won't be possible to reach N+1
 compliance at all, and will happily churn CPU for ages without
 realising it won't reach a solution.
diff --git a/hscan.1 b/hscan.1
index b7ae7513d7d61f3afea3aebdbfe83629cd89cc2e..c4bec29c3a6830bc11053fdb4b09a63b54261254 100644
--- a/hscan.1
+++ b/hscan.1
@@ -1,4 +1,4 @@
-.TH HSCAN 1 2009-03-22 htools "Ganeti H-tools"
+.TH HSCAN 1 2009-03-23 htools "Ganeti H-tools"
 .SH NAME
 hscan \- Scan clusters via RAPI and save node/instance data
 
@@ -25,6 +25,49 @@ For each cluster, two files named \fIcluster\fB.instances\fR and
 data. These files can then be used in \fBhbal\fR(1) or \fBhn1\fR(1)
 via the \fB-i\fR and \fB-n\fR options.
 
+The one-line output for each cluster will show the following:
+.RS
+.TP
+.B Name
+The name of the cluster (or the IP address that was given, etc.)
+.TP
+.B Nodes
+The number of nodes in the cluster
+.TP
+.B Inst
+The number of instances in the cluster
+.TP
+.B BNode
+The number of nodes failing N+1
+.TP
+.B BInst
+The number of instances living on N+1-failed nodes
+.TP
+.B t_mem
+Total memory in the cluster
+.TP
+.B f_mem
+Free memory in the cluster
+.TP
+.B t_disk
+Total disk in the cluster
+.TP
+.B f_disk
+Free disk space in the cluster
+.TP
+.B Score
+The score of the cluster, as would be reported by \fBhscan\fR(1) if
+run on the generated data files.
+
+.RE
+
+In case of errors while collecting data, all fields after the name of
+the cluster are replaced with the error display.
+
+.B Note:
+this output format is not yet final so it should not be used for
+scripting yet.
+
 .SH OPTIONS
 The options that can be passed to the program are as follows:
 
@@ -55,5 +98,21 @@ data).
 The program does not check its input data for consistency, and aborts
 with cryptic errors messages in this case.
 
+The RAPI collection doesn't deal with non-\fBdrbd\fR instances, and
+chokes on input data which has such instances.
+
+.SH EXAMPLE
+
+.in +4n
+.nf
+.RB "$ " "hscan cluster1"
+Name     Nodes  Inst BNode BInst  t_mem  f_mem t_disk f_disk      Score
+cluster1     2     2     0     0   1008    652    255    253 0.24404762
+.RB "$ " "ls -l cluster1.*"
+-rw-r--r-- 1 root root 163 2009-03-23 07:26 cluster1.instances
+-rw-r--r-- 1 root root  90 2009-03-23 07:26 cluster1.nodes
+.fi
+.in
+
 .SH SEE ALSO
 .BR hbal "(1), " hn1 "(1), " ganeti "(7), " gnt-instance "(8), " gnt-node "(8)"