Commits · 49148d15adb7b2e0694644c5307a17378c312e00 · itminedu / snf-ganeti

Oct 25, 2010

Move from hand-written man pages to RST/pandoc · 49148d15

Iustin Pop authored 14 years ago

This simplifies the maintenance of the man pages, and unifies the rst-to-*
converter to pandoc.

49148d15

Oct 21, 2010
- Add design for htools/Ganeti 2.3 sync · 92921ea4
  Iustin Pop authored 14 years ago
```
This is a work in progress, will be modified along with the progress
of Ganeti 2.3.
```
  92921ea4
Oct 07, 2010
- Update NEWS file for 0.2.7 release · ca8e1c6a
  Iustin Pop authored 14 years ago
  
  htools-v0.2.7
  
  ca8e1c6a
- Fix some warnings in unittests · e3ae9508
  Iustin Pop authored 14 years ago
  
  e3ae9508
Oct 06, 2010

Add a hack for normalized CPU values in hspace · 4886952e

Iustin Pop authored 14 years ago

Currently, the key metrics/tiered spec computations show the virtual cpu
count. However, since we do have a maximum ration Vcpu/Pcpu, we can also
show the “normalized” cpu count, i.e. the equivalent physical cpu count
corresponding to the virtual ones.

4886952e

Improve the error message for tiered alloc option · 03c6d8fa
Iustin Pop authored 14 years ago

03c6d8fa

Sep 15, 2010

hbal: implement user-friendly termination requests · 03cb89f0

Iustin Pop authored 14 years ago

Currently, hbal will abort immediately when requested (^C, or SIGINT,
etc.). This is not nice, since then the already started jobs need to be
tracked manually.

This patch adds a signal handler for SIGINT and SIGTERM, which will, the
first time, simply record the shutdown request (and hbal will then exit
once all jobs in the current jobset finish), and at the second request,
will cause an immediate exit.

03cb89f0

Sep 03, 2010

Document the gain options in hbal's manpage · 5f715404
Iustin Pop authored 14 years ago

5f715404
Use the mingain options in the balancing algorithm · 848b65c9
Iustin Pop authored 14 years ago
```
Also adds them in hbal.
```
848b65c9

Add new CLI options for min gain during balancing · 4f807a57

Iustin Pop authored 14 years ago

Recent hbal seems to run many steps for small improvements (< 1e-3), so
we should stop early in this case.

We add a new option (-g), that will be used for the minimum gain during
balancing. This check will only become active when the cluster score is
below a threshold (--min-gain-limit), so as to not stop rebalances too
early.

4f807a57

Sep 02, 2010

Makefile: make the rst2html converter more strict · d78ceb9e
Iustin Pop authored 14 years ago
```
This will make the automated builds flag any problems.
```
d78ceb9e

Add some more debugging functions · adc5c176

Iustin Pop authored 14 years ago

These are just variations of the standard debug, but are provided for
simpler code, since lazyness is something causing non-computation of
debug statements.

adc5c176

Fix ReplaceSecondary moves for offline nodes · 74e89a14

Iustin Pop authored 14 years ago

The addition of a new secondary on a node is doing two memory tests:
- in strict mode, reject if we get into N+1 failure
- reject if the new instance memory is greater than the free memory (not
  available memory) on the node

The last check is designed to ensure that, irrespective of the other
secondary instances on this node, we are able to failover/migrate the
newly-added instance.

However, we should allow this, if the instances comes from an offline
node, which doesn't offer anything (not even disk replication).
Therefore this patch makes this check conditional on the strict mode.

74e89a14

Update NEWS file · 49d977db
Iustin Pop authored 14 years ago

49d977db

Aug 30, 2010

Update man pages for the new -S option · db43d7b3
Iustin Pop authored 14 years ago

db43d7b3
hspace: mark new instances as running · 10852adb
Iustin Pop authored 14 years ago
```
Otherwise the saved cluster state and the in-memory one are wrong.
```
10852adb

Implement cluster state saving in hspace · 3e9501d0

Iustin Pop authored 14 years ago

This also uncovered a few issues with the allocation model (instances
not being marked up, etc.).

Compared to hbal, hspace will generate either one or two files (for both
the standard and the tiered allocation mode), depending on the input
parameters.

3e9501d0

Change iterateAlloc to return the instance list · 94d08202

Iustin Pop authored 14 years ago

The Cluster.iterateAlloc and tieredAlloc functions are changed to also
return the updated instance list, since it is needed to have a “full”
cluster view.

94d08202

Implement cluster state saving in hbal · 748654f7

Iustin Pop authored 14 years ago

Also move the LUXI execution (-X) to the end, after all the output
messages are printed. No good in waiting for the messages for a long
while, especially as they are not up-to-date stats after the job
execution, just an estimation of what the state will be.

748654f7

Abstract the cluster serialization from hscan.hs · 4a273e97

Iustin Pop authored 14 years ago

This is currently hardcoded in an internal function in hscan.hs, and we
move it to Text.hs for later use.

4a273e97

Aug 25, 2010

Add a new option --save-cluster · 02da9d07

Iustin Pop authored 14 years ago

This option will in the future be used to serialize the cluster state in
hbal and hspace after the rebalance/allocation steps.

02da9d07

Add unittest for Node text serialization · 50811e2c

Iustin Pop authored 14 years ago

This checks that the Node text serialization and deserialization
operations are idempotent when combined other.

50811e2c

Switch unittest to custom hostnames · a070c426

Iustin Pop authored 14 years ago

Currently, the hostnames are almost fully arbitrary chars, which breaks
the assumption that nodes/instances will be normal DNS hostnames.

This patch adds some custom generators for these hostnames, that will
allow better testing of text loader serialization/deserialization.

a070c426

Aug 24, 2010
- Move text serialization functions to Text.hs · 3bf75b7d
  Iustin Pop authored 14 years ago
```
Currently these are in hscan, and cannot be reused easily.
```
  3bf75b7d
Jul 29, 2010
- Fix a couple of typos in the manpages · 57ef88df
  Iustin Pop authored 14 years ago
```
Again, thanks to lintian.
```
  57ef88df
Jul 27, 2010
- hail: fix error message for failed multi-evac · 0ca66853
  Iustin Pop authored 14 years ago
```
Currently we show the instance index, but this makes no sense outside
the current running program. Instead, we show the instance name.
```
  0ca66853
- Update NEWS file for the 0.2.6 release · 84edb64b
  Iustin Pop authored 14 years ago
  
  htools-v0.2.6
  
  84edb64b
- NEWS: Add double blank lines before headers · 303bb0ed
  Iustin Pop authored 14 years ago
```
This looks better for text-only viewing…
```
  303bb0ed
Jul 23, 2010
- hscan: return exit code 2 for RAPI failures · f688711c
  Iustin Pop authored 14 years ago
```
If some clusters failed during RAPI collection, exit with exit code 2 so
that tests can detect this failure.
```
  f688711c
- More enhancements to live-test.sh · b7478ce1
  Iustin Pop authored 14 years ago
  
  b7478ce1
Jul 22, 2010
- Fix another haddock issue · b8262965
  Iustin Pop authored 14 years ago
  
  b8262965
- Remove an obsolete function and add Utils tests · 691dcd2a
  Iustin Pop authored 14 years ago
  
  691dcd2a
- Extend the live-test · b880f1d1
  Iustin Pop authored 14 years ago
```
The (recently-enabled) live test coverage stats found a few low-hanging
fruits in the tests we do…
```
  b880f1d1
Jul 21, 2010

Use --union for hpc sum · 7e9e8245

Iustin Pop authored 14 years ago

… which fixes the issue noted in the previous commit (almost a brown
paper bag change).

7e9e8245

Preliminary support for coverage during live-test · dc61c50b

Iustin Pop authored 14 years ago

While this doesn't work correctly yet (hpc sum seems to only take common
modules, not the sum of modules?), it prepares for gathering coverage
data during live-test (as an alternative to unittest coverage data).

dc61c50b

Add some more imports to QC.hs · 223dbe53

Iustin Pop authored 14 years ago

This is needed so that in the coverage report we list all modules, even
the ones we don't test at all, such that we get the complete results.

223dbe53

Change the meaning of the N+1 fail metric · c3c7a0c1

Iustin Pop authored 14 years ago

Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of
instances living on such nodes, which is much better because:
- moving an instance away while the node is still N+1 failing will still
  reflect in the score as an optimization
- moving the last instance causing an N+1 failure will result in a heavy
  decrease of this score, thus giving the right bonus to clear this
  status

c3c7a0c1

Introduce per-metric weights · 8a3b30ca

Iustin Pop authored 14 years ago

Currently all metrics have the same weight (we just sum them together).
However, for the hard constraints (N+1 failures, offline nodes, etc.)
we should handle the metrics differently based on their meaning. For
example, an instance living on a primary offline node is worse than an
instance having its secondary node offline, which in turn is worse than
an instance having its secondary node failing N+1.

To express this case in our code, we introduce a table of weights for
the metrics, with which we can influence their relative importance.

8a3b30ca

Allow balancing moves to introduce N+1 errors · 2cae47e9

Iustin Pop authored 14 years ago

This patch switches the applyMove function to the extended versions of
Node.addPri and addSec, and passes the override flag based on the state
of the node that we're moving away from.

2cae47e9

Introduce a relaxed add instance mode · 3e3c9393

Iustin Pop authored 14 years ago

In case an instance is living on an offline node, it doesn't make sense
to refuse moving it because that would create N+1 failures; failing N+1
is still much better than not running at all. Similarly, if the
secondary node of an instance is offline, meaning the instance doesn't
have any redundancy, we have a worse case than having a secondary that
is N+1 failing and it could not accept the instance as primary, but it
stil does redundancy for it.

To allow this, we rename Node.addPri to addPriEx and introduce an extra
parameter (addPri is a partial application of addPriEx and keeps the
same signature). Node.addSec gets the same treatement.

3e3c9393