- Dec 23, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Adeodato Simo <dato@google.com>
-
Iustin Pop authored
Currently, LUXI job failures only display a warning message, while still returning a success exit code. We change hbal to return true/false from within execJobSet/runJobSet, and add a wrapper for simpler code. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Adeodato Simo <dato@google.com>
-
Iustin Pop authored
Currently the balancing function is a modified version of the standard deviation (stddev divided by list length), due to historical reasons. While this works fine for small clusters, for big clusters it makes the balancing effect too "weak", and in some cases it refuses to balance correctly some clusters. It also makes the balancing behaviour dependant on the cluster size, which is a big no-no. Therefore we revert to the normal version of standard deviation, and we also rename the function to reflect what it does. The new version correctly balances some corner cases that the previous version didn't, and passes the current balancing unittests. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Adeodato Simo <dato@google.com>
-
- Nov 23, 2010
-
-
Iustin Pop authored
This does just two passes, instead of three, over the list. This reduces the overall runtime well enough (~25%) in some tests, but it's not reproducible using profiling, so I don't know how much the function itself is being sped-up. Note: this is written via `seq`s, and not BangPatterns. Since it's just one case, adding BangPatterns just for it wasn't a big gain. Thanks to Lécz Balázs for the impetus to improve this! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Balazs Lecz <leczb@google.com>
-
Iustin Pop authored
Currently, hbal does a one-two signal handling, where the first signal causes graceful termination, and the second one an immediate on (either SIGINT or SIGTERM can be used, interchangeably). However, this poses a timing problem: if two programs want to send a graceful termination request, they cannot do that without careful coordination. To fix this, we change to code to handle the signal separately: SIGINT (^C) sends graceful termination, while SIGTERM sends immediate termination. This should allow easier controlling of hbal. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Balazs Lecz <leczb@google.com>
-
- Nov 09, 2010
-
-
Iustin Pop authored
Currently, the tag exclusion metric has a weight of one, which means there might be cases where we won't move instances around because it upsets the cluster metrics. However, we do want to make a higher effort for cleaning up tag collisions, so we increase the weight to an empirically-determined value of 2.
-
- Oct 21, 2010
-
-
Iustin Pop authored
This is a work in progress, will be modified along with the progress of Ganeti 2.3.
-
- Oct 07, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
-
- Oct 06, 2010
-
-
Iustin Pop authored
Currently, the key metrics/tiered spec computations show the virtual cpu count. However, since we do have a maximum ration Vcpu/Pcpu, we can also show the “normalized” cpu count, i.e. the equivalent physical cpu count corresponding to the virtual ones.
-
Iustin Pop authored
-
- Sep 15, 2010
-
-
Iustin Pop authored
Currently, hbal will abort immediately when requested (^C, or SIGINT, etc.). This is not nice, since then the already started jobs need to be tracked manually. This patch adds a signal handler for SIGINT and SIGTERM, which will, the first time, simply record the shutdown request (and hbal will then exit once all jobs in the current jobset finish), and at the second request, will cause an immediate exit.
-
- Sep 03, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
Also adds them in hbal.
-
Iustin Pop authored
Recent hbal seems to run many steps for small improvements (< 1e-3), so we should stop early in this case. We add a new option (-g), that will be used for the minimum gain during balancing. This check will only become active when the cluster score is below a threshold (--min-gain-limit), so as to not stop rebalances too early.
-
- Sep 02, 2010
-
-
Iustin Pop authored
This will make the automated builds flag any problems.
-
Iustin Pop authored
These are just variations of the standard debug, but are provided for simpler code, since lazyness is something causing non-computation of debug statements.
-
Iustin Pop authored
The addition of a new secondary on a node is doing two memory tests: - in strict mode, reject if we get into N+1 failure - reject if the new instance memory is greater than the free memory (not available memory) on the node The last check is designed to ensure that, irrespective of the other secondary instances on this node, we are able to failover/migrate the newly-added instance. However, we should allow this, if the instances comes from an offline node, which doesn't offer anything (not even disk replication). Therefore this patch makes this check conditional on the strict mode.
-
Iustin Pop authored
-
- Aug 30, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
Otherwise the saved cluster state and the in-memory one are wrong.
-
Iustin Pop authored
This also uncovered a few issues with the allocation model (instances not being marked up, etc.). Compared to hbal, hspace will generate either one or two files (for both the standard and the tiered allocation mode), depending on the input parameters.
-
Iustin Pop authored
The Cluster.iterateAlloc and tieredAlloc functions are changed to also return the updated instance list, since it is needed to have a “full” cluster view.
-
Iustin Pop authored
Also move the LUXI execution (-X) to the end, after all the output messages are printed. No good in waiting for the messages for a long while, especially as they are not up-to-date stats after the job execution, just an estimation of what the state will be.
-
Iustin Pop authored
This is currently hardcoded in an internal function in hscan.hs, and we move it to Text.hs for later use.
-
- Aug 25, 2010
-
-
Iustin Pop authored
This option will in the future be used to serialize the cluster state in hbal and hspace after the rebalance/allocation steps.
-
Iustin Pop authored
This checks that the Node text serialization and deserialization operations are idempotent when combined other.
-
Iustin Pop authored
Currently, the hostnames are almost fully arbitrary chars, which breaks the assumption that nodes/instances will be normal DNS hostnames. This patch adds some custom generators for these hostnames, that will allow better testing of text loader serialization/deserialization.
-
- Aug 24, 2010
-
-
Iustin Pop authored
Currently these are in hscan, and cannot be reused easily.
-
- Jul 29, 2010
-
-
Iustin Pop authored
Again, thanks to lintian.
-
- Jul 27, 2010
-
-
Iustin Pop authored
Currently we show the instance index, but this makes no sense outside the current running program. Instead, we show the instance name.
-
Iustin Pop authored
-
Iustin Pop authored
This looks better for text-only viewing…
-
- Jul 23, 2010
-
-
Iustin Pop authored
If some clusters failed during RAPI collection, exit with exit code 2 so that tests can detect this failure.
-
Iustin Pop authored
-
- Jul 22, 2010
-
-
Iustin Pop authored
-
Iustin Pop authored
-
Iustin Pop authored
The (recently-enabled) live test coverage stats found a few low-hanging fruits in the tests we do…
-
- Jul 21, 2010
-
-
Iustin Pop authored
… which fixes the issue noted in the previous commit (almost a brown paper bag change).
-
Iustin Pop authored
While this doesn't work correctly yet (hpc sum seems to only take common modules, not the sum of modules?), it prepares for gathering coverage data during live-test (as an alternative to unittest coverage data).
-