Commits · dc61c50b72d607d118b3ae01430c6ac805aa1701 · itminedu / snf-ganeti

Jul 21, 2010

Preliminary support for coverage during live-test · dc61c50b

Iustin Pop authored 14 years ago

While this doesn't work correctly yet (hpc sum seems to only take common
modules, not the sum of modules?), it prepares for gathering coverage
data during live-test (as an alternative to unittest coverage data).

dc61c50b

Add some more imports to QC.hs · 223dbe53

Iustin Pop authored 14 years ago

This is needed so that in the coverage report we list all modules, even
the ones we don't test at all, such that we get the complete results.

223dbe53

Change the meaning of the N+1 fail metric · c3c7a0c1

Iustin Pop authored 14 years ago

Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of
instances living on such nodes, which is much better because:
- moving an instance away while the node is still N+1 failing will still
  reflect in the score as an optimization
- moving the last instance causing an N+1 failure will result in a heavy
  decrease of this score, thus giving the right bonus to clear this
  status

c3c7a0c1

Introduce per-metric weights · 8a3b30ca

Iustin Pop authored 14 years ago

Currently all metrics have the same weight (we just sum them together).
However, for the hard constraints (N+1 failures, offline nodes, etc.)
we should handle the metrics differently based on their meaning. For
example, an instance living on a primary offline node is worse than an
instance having its secondary node offline, which in turn is worse than
an instance having its secondary node failing N+1.

To express this case in our code, we introduce a table of weights for
the metrics, with which we can influence their relative importance.

8a3b30ca

Allow balancing moves to introduce N+1 errors · 2cae47e9

Iustin Pop authored 14 years ago

This patch switches the applyMove function to the extended versions of
Node.addPri and addSec, and passes the override flag based on the state
of the node that we're moving away from.

2cae47e9

Introduce a relaxed add instance mode · 3e3c9393

Iustin Pop authored 14 years ago

In case an instance is living on an offline node, it doesn't make sense
to refuse moving it because that would create N+1 failures; failing N+1
is still much better than not running at all. Similarly, if the
secondary node of an instance is offline, meaning the instance doesn't
have any redundancy, we have a worse case than having a secondary that
is N+1 failing and it could not accept the instance as primary, but it
stil does redundancy for it.

To allow this, we rename Node.addPri to addPriEx and introduce an extra
parameter (addPri is a partial application of addPriEx and keeps the
same signature). Node.addSec gets the same treatement.

3e3c9393

Jul 19, 2010

Remove obsolete Container.maxNameLen · 2849670b

Iustin Pop authored 14 years ago

This was only used in one place (hbal), and is obsolete by the change to
the dual name/alias structure.

2849670b

hbal: print short names in steps list · 14c972c7

Iustin Pop authored 14 years ago

This was a regression from the name handling changes, as we started
using the original names for the solution list (which is not designed
for parsing/feeding back into ganeti).

14c972c7

Remove an obsolete function · fb33aaaf
Iustin Pop authored 14 years ago
```
printSolution is no longer used, as we print the solution iteratively
now.
```
fb33aaaf

Jul 18, 2010

Allow '+' in node list fields · 6dfa04fd

Iustin Pop authored 14 years ago

When the field list is prefixed with a plus sign, this will extend the
default field list, instead of replacing it entirely.

6dfa04fd

Update the node list fields · 16f08e82

Iustin Pop authored 14 years ago

This patch renames the pri/sec to pcnt/scnt, and adds the real primary
and secondary instance lists, the peermap and the index of a node as
selectable options.

16f08e82

Cleanup a node's peer map when possible · 124b7cd7

Iustin Pop authored 14 years ago

If the last secondary instance of a peer is deleted (detected by the new
peer memory value being equal to zero), then the pair (pdx, 0) should be
deleted completely. This is not optimization per se, but rather cleanup
(the speedup is at most a percent, and only in some corner cases).

124b7cd7

Jul 16, 2010

Fix handling of offline options and short names · f9acea10

Iustin Pop authored 14 years ago


This needs to be abstracted in a separate function, but in the meantime
we fix the issue in both places.

Signed-off-by: Iustin Pop <iustin@google.com>

f9acea10

Jun 21, 2010
- Fix another haddock special-char issue · 95446d7a
  Iustin Pop authored 14 years ago
  
  95446d7a
- Remove JOB_STATUS_GONE and add unittests · db079755
  Iustin Pop authored 14 years ago
```
… for the serialization/deserialization of the job and opcode status.

Job status 'gone' was not actually used. It can be reintroduced if
needed.
```
  db079755
- Add opcode status constants/type · 41065165
  Iustin Pop authored 14 years ago
```
This mirrors, again, the Ganeti constats, and are added for future use.
```
  41065165
- Rename the job status constants · 7e98f782
  Iustin Pop authored 14 years ago
```
The rename is done such that we match Ganeti's own constants.
```
  7e98f782
Jun 08, 2010

Optimise the Luxi.recvMsg function · 95f490de

Iustin Pop authored 15 years ago

Since the current buffer cannot contain (during network reads) an EOM,
we should look for the EOM only in the newly-received string.  While
this shouldn't make much difference, in some tests it cuts the recvMsg
total time by around half.

On entering recvMsg, we have though to search the old buffer for a
message though, since we could have received two Luxi messages on the
last network query; this is however a one-off cost, compared to
continuously looking for the EOM in the old string (at each receive
loop).

95f490de

Jun 07, 2010

Complete the client Luxi implementation · 04282772

Iustin Pop authored 14 years ago

All current Luxi calls are supported after this patch. A bug in
ArchiveJob is also fixed (Ganeti's job IDs are strings).

04282772

Add support for more LUXI calls · 9622919d

Iustin Pop authored 14 years ago

While not are directly useful, having them will open some possibilities
(e.g. polling for job changes in hbal's -X mode, and auto-archiving the
jobs once they are successful).

9622919d

Jun 02, 2010

Fix some lint errors in the unit tests · 4a007641
Iustin Pop authored 14 years ago

4a007641

Change the Luxi operations structure · 683b1ca7

Iustin Pop authored 14 years ago

Currently, we define the LuxiOp type as a simple enumeration, and leave
the arguments structure to the users of the Ganeti.Luxi module. This is
suboptimal for a couple of reasons: first, we decouple the operation
type from operation arguments, and that means we don't use the type
system for validation of the arguments; second, the clients themselves
have to know about the JSON encoding of the protocol.

For the above arguments, we change the operation type to contain the
arguments too, and then the entire conversion/serialization is
restricted to the Ganeti.Luxi module. Also, the removal of the JSON
encoding from the clients results in an overall simplification of the
code.

683b1ca7

Jun 01, 2010
- Fix a warning in Loader tests · 9c0a748f
  Iustin Pop authored 14 years ago
```
Incomplete pattern match…
```
  9c0a748f
- Add a few Loader tests · c088674b
  Iustin Pop authored 14 years ago
```
These are not comprehensive, but at least we have a start.
```
  c088674b
May 30, 2010

Modify the test runner to show test exceptions · 8c5652f6

Iustin Pop authored 14 years ago

QuickCheck's batch driver (at least v1) doesn't show the test aborts,
but simply discards the specific exception and increases the abort
count. This makes it hard to debug the tests, so we modify our own test
wrapper (which so far only tracked total failures) to show any
exceptions.

8c5652f6

May 28, 2010

Reduce the warnings during the unittests · 9e35522c

Iustin Pop authored 14 years ago

Since the unittests are not 'clean' from the p.o.v. of type
declarations, and cannot be made clean in all respects (e.g. orphan
instances), we silence some warnings for the test target, to have a
cleaner output.

9e35522c

May 27, 2010

Improve the test driver · 06fe0cea

Iustin Pop authored 14 years ago

The tests are moved to a separate data structure, and we can select a
subset of tests to run.

06fe0cea

Introduce OpCode unittests · 88f25dd0
Iustin Pop authored 14 years ago

88f25dd0

Introduce suport for optional keys in JObjects · f36a8028

Iustin Pop authored 14 years ago

Some keys are optional in the Ganeti opcodes (e.g. ‘node’ in the
OpReplaceDisks), and as such we need to transform them in a Maybe value,
instead of failing.

The patch reworks a bit fromObj and adds maybeFromObj which parses such
optional values. It then uses it in the opcode reading.

f36a8028

Replace fromJResult with annotateJResult · c96d44df

Iustin Pop authored 14 years ago

This patch removes all old uses of fromJResult with the annotated
version, and removes the non-annotated version. All JSON parsing points
should now have annotated errors.

c96d44df

Add annotations to loadJSArray · c8b662f1

Iustin Pop authored 14 years ago

This allows, for example, the RAPI backend to detail which information
(instance or node data) fails to parse.

c8b662f1

Change fromObj error messages · 50d26669

Iustin Pop authored 14 years ago

Currently fromObj doesn't detail what we're trying to read, which can
lead to cryptic messages: "Cannot read Int". The patch changes this
function to annotate the error messages with the key/value we're trying
to convert, by using a new version of fromJResult.

Since the display of the key in tryFromObj is now redundant (it was
already redundant in the 'not found' case), we remove it.

The new version of fromJResult (annotateJResult) simply prepends a
description string to the actual error message.

50d26669

May 26, 2010
- A few more small Node unit-tests · 82ea2874
  Iustin Pop authored 14 years ago
  
  82ea2874
May 25, 2010
- Add more unittests · 39d11971
  Iustin Pop authored 14 years ago
```
Instance, Node and Text modules have improved coverage.
```
  39d11971
May 20, 2010

Add more unit tests for allocation/balance · 3fea6959

Iustin Pop authored 14 years ago

The patch adds some simple unit-tests for both the allocation function
(we can allocate small instances on an empty cluster, we can allocate in
tiered more starting from any size) and the balancing functions (one
single instance is placed optimally, a full cluster plus an empty node
can be rebalanced). The coverage has increased greatly, since this is
the bulk of the algorithm/code.

Also, the cluster tests are now being run with different options, since
they are much slower.

3fea6959

Move two functions from hspace to Cluster.hs · 3ce8009a
Iustin Pop authored 14 years ago
```
This is done so we can test a longer pipeline.
```
3ce8009a
Make CStats instance of show · 8423f76b
Iustin Pop authored 14 years ago
```
This helps debugging via ghci.
```
8423f76b

Clarify options related to name passing · ada2fc6d

Iustin Pop authored 14 years ago

After the name patches, we can pass in either the short or the full
name, so update the hbal man page accordingly.

ada2fc6d

Another haddoc fix… · 381be58a
Iustin Pop authored 14 years ago

381be58a

Accept both full and short names in CLI · c854092b

Iustin Pop authored 15 years ago

This patch introduces some new functionality in the base Element type
and in Container which supports searching for all 'known' names of an
element, such that both short and full names are accept for various
options like '-O' and '--excluded-instances'.

c854092b