- Oct 03, 2011
-
-
Iustin Pop authored
This replaces the hand-coded opcode serialisation code with auto-generation based on TemplateHaskell. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This replaces the hand-coded opID with one automatically generated from the constructor names, similar to the way Python does it, except it's done at compilation time as opposed to runtime. Again, the code line delta does not favour this patch, but this eliminates error-prone, manual code with auto-generated one; in case we add more opcode support, this will help a lot. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This patch replaces the current hard-coded JSON instances (all alike, just manual conversion to/from string) with auto-generated code based on Template Haskell (http://www.haskell.org/haskellwiki/Template_Haskell ). The reduction in code line is not big, as the helper module is well documented and thus overall we gain about 70 code lines; however, if we ignore comments we're in good shape, and any future addition of such data types will be much simpler and less error-prone. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This changes the names for some helper functions so that future patches are touching less unrelated code. The change replaces shortened prefixes with the full type name. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Utils is a bit big, let's split the JSON stuff (not all of it) into a separate module that doesn't have any other dependencies. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 29, 2011
-
-
Iustin Pop authored
This is very useful for testing/benchmarking. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Currently, the node pairs used for allocation are a simple [(primary, secondary)] list of tuples, as this is how they were used before the previous patch. However, for that patch, we use them separately per primary node, and we have to unpack this list right after generation. Therefore it makes sense to directly generate the list in the correct form, and remove the split from tryAlloc. This should not be slower than the previous patch, at least, possibly even faster. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This patch finally enables parallelisation in instance placement. My original try for enabling this didn't work well, but it took a while (and liberal use of threadscope) to understand why. The attempt was to simply `parMap rwhnf` over allocateOnPair, however this is not good as for a 100-node cluster, this will create roughly 100*100 sparks, which is way too much: each individual spark is too small, and there are too many sparks. Furthermore, the combining of the allocateOnPair results was done single-threaded, losing even more parallelism. So we had O(n²) sparks to run in parallel, each spark of size O(1), and we combine single-threadedly a list of O(n²) length. The new algorithm does a two-stage process: we group the list of valid pairs per primary node, relying on the fact that usually the secondary nodes are somewhat balanced (it's definitely true for 'blank' cluster computations). We then run in parallel over all primary nodes, doing both the individual allocateOnPair calls *and* the concatAllocs summarisation. This leaves only the summing of the primary group results together for the main execution thread. The new numbers are: O(n) sparks, each of size O(n), and we combine single-threadedly a list of O(n) length. This translates directly into a reasonable speedup (relative numbers for allocation of 3 instances on a 120-node cluster): - original code (non-threaded): 1.00 (baseline) - first attempt (2 threads): 0.81 (20% slowdown
‼️ ) - new code (non-threaded): 1.00 (no slowdown) - new code (threaded/1 thread): 1.00 - new code (2 threads): 1.65 (65% faster) We don't get a 2x speedup, because the GC time increases. Fortunately the code should scale well to more cores, so on many-core machines we should get a nice overall speedup. On a different machine with 4 cores, we get 3.29x. Signed-off-by:Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This is moved outside of the concatAllocs as it will be needed in another place in the future. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Originally, this data type was used both by instance allocation (1 result), and by instance relocation (many results, one per instance). As such, the field 'asSolutions' was a list, and the various code paths checked whether the length of the list matches the current mode. This is very ugly, as we can't guarantee this matching via the type system; hence the FIXME in the code. However, commit 6804faa0 removed the instance evacuation code, and thus we now always use just one allocation solution. Hence we can change the data type to a simply Maybe type, and get rid of many 'otherwise barf out' conditions. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 23, 2011
-
-
Iustin Pop authored
This adds a shortened versions of the allocation policies, as writing out the whole name in the command line can become tedious. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 22, 2011
-
-
Iustin Pop authored
This will be used to implement more easily 'choice' parsing of input data, without resorting to syntax (case … of Bad _ -> …). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 14, 2011
-
-
Iustin Pop authored
The tryEvac/evacuateInstance functions are no longer used in the new multi-group world order, so we remove them and change the unit-test to test the actual IAllocator function. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This just adds the primary node of the instance as 'non-allocable' during the choosing of the new secondary. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
If we select the primary as new secondary, better to fail than return wrong data to Ganeti. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Aug 08, 2011
-
-
Iustin Pop authored
As discussed offline, the new node-change mode could be used for evacuation, but it's not directly useful as it returns a list of opcodes; therefore, we need to partially revert commits fbe5fcf6 and 5b53ca79 that removed it (and multi-evacuate, which remains removed). The new version of relocate is actually just a wrapper over the tryNodeEvac (which does the node evacuate); we run that and then we do some extra checks that the nodes we got from that function are consistent with the instance's new state. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 22, 2011
-
-
Iustin Pop authored
I think I've identified the problem with the current ChangeAll mode. The current algorithm works as follows: - identify a new primary by choosing the node which gives best score as new secondary - failover to it - identify a new secondary by choosing the node which gives best score as new secondary This means that the future primary is 'fixed' after the first iteration, leaving to possibly suboptimal results. This patch changes the algorithm to do what, in hindsight, seems the obvious thing to do: - generate all pairs (primary, secondary) - identify the pair that after the above sequence (r:np, f, r:ns) gives the best group score This fixes some of the corner cases I've seen in relocation, but not all; the remaining cases are related to multi-instance relocation and while they can't be fixed in the current framework, the needed rebalancing is much smaller than with the current algorithm. The patch also fixes an issue with the docstring of another function. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
These two cases use explicit uses of primary and secondary nodes with Instance.allNodes, which means the code is more flexible if the internal layout of the instance changes. I've verified that the output of involvedNodes is not required to be 4-element long, and as such the function docstring has been updated. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
… and failover too. Not many changes otherwise except for serialisation and unittests. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
These will be used in Node.hs for proper add/remove instance code. Furthermore, we restrict the movable status to the right disk templates only, so that we don't attempt to move the 'wrong' instance types. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 21, 2011
-
-
Iustin Pop authored
This adds tests for the opToResult and eitherToResult functions from Types.hs, and changes two other tests for the same module to test JSON serialisation (which automatically also tests the lower-level to/from string conversion functions). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Tested only on GHC 7.x, will test on 6.1x too before commit. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This adds parameter documentation for Cluster.iMoveToJob (I think it was not clear if the new or old node list is needed) and fixes other docstring style issues. After this patch, all modules except for CLI.hs (which has many obvious declarations for command-line options) and QC.hs (unittests) have 100% doc-strings. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This abstracts the JSON parsing of the type EvacMode near its definition, and simplifies its conversion in IAlloc.parseData. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, hspace can only output a machine-readable format that (while detailed) is hard to parse quickly by people. This patch adds (and enables by default) a human-readable output that shows the most important metrics in a simple format. Most of the work of the patch is in moving the display of various metrics from the 'main' function to separate functions, each of which can output either a machine or human intended format. The patch also corrects a bug in the CPU efficiency display: before, the efficiency was computed as instance virtual CPUs divided by total physical CPUs, which is almost always supra-unitary. More correct is to divide by the total virtual CPUs, which shows a more meaningful number (when the p-to-v CPU ratio has been defined correctly). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit 56c094b4 added use of job constants, but I didn't pay attention and ended up mixing things: job constants were used for opcode ones, and the job ones didn't get converted. This patch corrects it and uses only C.* constants throughout the Jobs module. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This patch renames the {JOB,OP}_STATUS_WAITLOCK constants to {JOB,OP}_STATUS_WAITING, as per design document for chained jobs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 20, 2011
-
-
Guido Trotter authored
hspace and hbal treat -O differently, and use aliases for short names (although hbal succeeds in that, and hspace doesn't). Uniform this with a name lookup, using the same functions we used for instance selection/exclusion. Some of the code is by the way a bit repetitive, and could probably be merged in a single function. That needs to be a monadic one, though, so I promise to do it as soon as I realize how to write them! ;) Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 19, 2011
-
-
Iustin Pop authored
This will be used in hspace to toggle between "human" readable and machine readable output formats. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This is used just in hspace, so let's help in making Cluster.hs smaller. We also split the function in two, as computing the spec map and formatting it are two different tasks. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This adds the binaries code to the coverage, and thus the coverage finally shows the real coverage over all logic code (except for the htools.hs code, which is not logic code related to the algorithms, so it doesn't matter — plus it's also very small). Next steps will be to actually add coverage for this code, especially for hbal and hspace, which are relatively big compared to hail and hscan (around 800 expressions versus 200-300 expressions). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This is the last patch of the binaries conversion. As information, we now have a single binary that is approx. 5.4MiB in size, compared to 4 binaries that were approx. 5.1-5.2MiB in size; this will result in a smaller package and install size, and the single compilation phase should also help. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In addition, the patch adds a separate Makefile variable for holding the binary roles to make it more clear what we symlink. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This converts the first binary to the generic 'htools' binary. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This is the start of a series of patches that will unify all the binaries currently in use in a single one, which can perform different roles based on the name it is installed as. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
When compiling with the parallel-3.x library, we get a deprecation warning, which makes understanding any other error messages harder. This patch adds a compatibility module that will hold such code for transitioning libraries. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
… which was deprecated by the previous patch. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-