- Aug 30, 2010
-
-
Iustin Pop authored
This also uncovered a few issues with the allocation model (instances not being marked up, etc.). Compared to hbal, hspace will generate either one or two files (for both the standard and the tiered allocation mode), depending on the input parameters.
-
Iustin Pop authored
The Cluster.iterateAlloc and tieredAlloc functions are changed to also return the updated instance list, since it is needed to have a “full” cluster view.
-
Iustin Pop authored
Also move the LUXI execution (-X) to the end, after all the output messages are printed. No good in waiting for the messages for a long while, especially as they are not up-to-date stats after the job execution, just an estimation of what the state will be.
-
Iustin Pop authored
This is currently hardcoded in an internal function in hscan.hs, and we move it to Text.hs for later use.
-
- Aug 25, 2010
-
-
Iustin Pop authored
This option will in the future be used to serialize the cluster state in hbal and hspace after the rebalance/allocation steps.
-
Iustin Pop authored
This checks that the Node text serialization and deserialization operations are idempotent when combined other.
-
Iustin Pop authored
Currently, the hostnames are almost fully arbitrary chars, which breaks the assumption that nodes/instances will be normal DNS hostnames. This patch adds some custom generators for these hostnames, that will allow better testing of text loader serialization/deserialization.
-
- Aug 24, 2010
-
-
Iustin Pop authored
Currently these are in hscan, and cannot be reused easily.
-
- Jul 29, 2010
-
-
Iustin Pop authored
Again, thanks to lintian.
-
- Jul 27, 2010
-
-
Iustin Pop authored
Currently we show the instance index, but this makes no sense outside the current running program. Instead, we show the instance name.
-
Iustin Pop authored
-
Iustin Pop authored
This looks better for text-only viewing…
-
- Jul 23, 2010
-
-
Iustin Pop authored
If some clusters failed during RAPI collection, exit with exit code 2 so that tests can detect this failure.
-
Iustin Pop authored
-
- Jul 22, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
-
Iustin Pop authored
The (recently-enabled) live test coverage stats found a few low-hanging fruits in the tests we do…
-
- Jul 21, 2010
-
-
Iustin Pop authored
… which fixes the issue noted in the previous commit (almost a brown paper bag change).
-
Iustin Pop authored
While this doesn't work correctly yet (hpc sum seems to only take common modules, not the sum of modules?), it prepares for gathering coverage data during live-test (as an alternative to unittest coverage data).
-
Iustin Pop authored
This is needed so that in the coverage report we list all modules, even the ones we don't test at all, such that we get the complete results.
-
Iustin Pop authored
Currently, this metric tracks the nodes failing the N+1 check. While this helps (in some cases) to evacuate such nodes, it's not a good metric since rarely it will change during a step (only at the last instance moving away). Therefore we replace it with the count of instances living on such nodes, which is much better because: - moving an instance away while the node is still N+1 failing will still reflect in the score as an optimization - moving the last instance causing an N+1 failure will result in a heavy decrease of this score, thus giving the right bonus to clear this status
-
Iustin Pop authored
Currently all metrics have the same weight (we just sum them together). However, for the hard constraints (N+1 failures, offline nodes, etc.) we should handle the metrics differently based on their meaning. For example, an instance living on a primary offline node is worse than an instance having its secondary node offline, which in turn is worse than an instance having its secondary node failing N+1. To express this case in our code, we introduce a table of weights for the metrics, with which we can influence their relative importance.
-
Iustin Pop authored
This patch switches the applyMove function to the extended versions of Node.addPri and addSec, and passes the override flag based on the state of the node that we're moving away from.
-
Iustin Pop authored
In case an instance is living on an offline node, it doesn't make sense to refuse moving it because that would create N+1 failures; failing N+1 is still much better than not running at all. Similarly, if the secondary node of an instance is offline, meaning the instance doesn't have any redundancy, we have a worse case than having a secondary that is N+1 failing and it could not accept the instance as primary, but it stil does redundancy for it. To allow this, we rename Node.addPri to addPriEx and introduce an extra parameter (addPri is a partial application of addPriEx and keeps the same signature). Node.addSec gets the same treatement.
-
- Jul 19, 2010
-
-
Iustin Pop authored
This was only used in one place (hbal), and is obsolete by the change to the dual name/alias structure.
-
Iustin Pop authored
This was a regression from the name handling changes, as we started using the original names for the solution list (which is not designed for parsing/feeding back into ganeti).
-
Iustin Pop authored
printSolution is no longer used, as we print the solution iteratively now.
-
- Jul 18, 2010
-
-
Iustin Pop authored
When the field list is prefixed with a plus sign, this will extend the default field list, instead of replacing it entirely.
-
Iustin Pop authored
This patch renames the pri/sec to pcnt/scnt, and adds the real primary and secondary instance lists, the peermap and the index of a node as selectable options.
-
Iustin Pop authored
If the last secondary instance of a peer is deleted (detected by the new peer memory value being equal to zero), then the pair (pdx, 0) should be deleted completely. This is not optimization per se, but rather cleanup (the speedup is at most a percent, and only in some corner cases).
-
- Jul 16, 2010
-
-
Iustin Pop authored
This needs to be abstracted in a separate function, but in the meantime we fix the issue in both places. Signed-off-by:
Iustin Pop <iustin@google.com>
-
- Jun 21, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
… for the serialization/deserialization of the job and opcode status. Job status 'gone' was not actually used. It can be reintroduced if needed.
-
Iustin Pop authored
This mirrors, again, the Ganeti constats, and are added for future use.
-
Iustin Pop authored
The rename is done such that we match Ganeti's own constants.
-
- Jun 08, 2010
-
-
Iustin Pop authored
Since the current buffer cannot contain (during network reads) an EOM, we should look for the EOM only in the newly-received string. While this shouldn't make much difference, in some tests it cuts the recvMsg total time by around half. On entering recvMsg, we have though to search the old buffer for a message though, since we could have received two Luxi messages on the last network query; this is however a one-off cost, compared to continuously looking for the EOM in the old string (at each receive loop).
-
- Jun 07, 2010
-
-
Iustin Pop authored
All current Luxi calls are supported after this patch. A bug in ArchiveJob is also fixed (Ganeti's job IDs are strings).
-
Iustin Pop authored
While not are directly useful, having them will open some possibilities (e.g. polling for job changes in hbal's -X mode, and auto-archiving the jobs once they are successful).
-
- Jun 02, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
Currently, we define the LuxiOp type as a simple enumeration, and leave the arguments structure to the users of the Ganeti.Luxi module. This is suboptimal for a couple of reasons: first, we decouple the operation type from operation arguments, and that means we don't use the type system for validation of the arguments; second, the clients themselves have to know about the JSON encoding of the protocol. For the above arguments, we change the operation type to contain the arguments too, and then the entire conversion/serialization is restricted to the Ganeti.Luxi module. Also, the removal of the JSON encoding from the clients results in an overall simplification of the code.
-