- Oct 04, 2011
-
-
Iustin Pop authored
Include QCHelper.hs in the distributed files, and also exclude it and the THH.hs file from coverage reports. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
- Oct 03, 2011
-
-
Iustin Pop authored
Now that the basic code works, let's use some aliases for simpler code and less ))))))))). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch adds a few niceties to the test suite: - allows matching test groups case insensitive and emit warnings when we give test group names that don't match anything - add a new operator that is similar to assertEqual in Python: it tests for equality and emits the two values in case of error Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This makes error message change from "Test 4 failed …" to "Test prop_Loader_mergeData failed", which is much more readable. It also removes the duplication of test suite names in the test.hs file. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This replaces the hand-coded opcode serialisation code with auto-generation based on TemplateHaskell. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This replaces the hand-coded opID with one automatically generated from the constructor names, similar to the way Python does it, except it's done at compilation time as opposed to runtime. Again, the code line delta does not favour this patch, but this eliminates error-prone, manual code with auto-generated one; in case we add more opcode support, this will help a lot. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This patch replaces the current hard-coded JSON instances (all alike, just manual conversion to/from string) with auto-generated code based on Template Haskell (http://www.haskell.org/haskellwiki/Template_Haskell ). The reduction in code line is not big, as the helper module is well documented and thus overall we gain about 70 code lines; however, if we ignore comments we're in good shape, and any future addition of such data types will be much simpler and less error-prone. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This changes the names for some helper functions so that future patches are touching less unrelated code. The change replaces shortened prefixes with the full type name. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Utils is a bit big, let's split the JSON stuff (not all of it) into a separate module that doesn't have any other dependencies. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 30, 2011
-
-
Andrea Spadaccini authored
* devel-2.5: Use --yes to deactivate master ip in cluster merge Use deactivate-master-ip in cluster-merge Add gnt-cluster commands to toggle the master IP Split starting and stopping master IP and daemons listrunner: Don't pass arguments if there are none ssh: Quote strings in error message utils.log: Write error messages to stderr Add signal handling doc to hbal man page Migration: warn the user about hv version mismatch Fix handling of cluster verify hooks Redistribute the RAPI certificate QA: Add tests for instance start/stop via RAPI RAPI: Fix wrong check on instance shutdown baserlib: Accept empty body in FillOpcode Conflicts: lib/backend.py - no real conflicts lib/constants.py - preserve both changes lib/rapi/rlib2.py - keep master lib/rpc.py - no real conflicts tools/cluster-merge - keep devel-2.5 Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Andrea Spadaccini authored
* stable-2.5: listrunner: Don't pass arguments if there are none ssh: Quote strings in error message utils.log: Write error messages to stderr Add signal handling doc to hbal man page Fix handling of cluster verify hooks Redistribute the RAPI certificate QA: Add tests for instance start/stop via RAPI RAPI: Fix wrong check on instance shutdown baserlib: Accept empty body in FillOpcode Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Andrea Spadaccini <spadaccio@google.com>
-
Andrea Spadaccini authored
Use the gnt-cluster deactivate-master-ip command in cluster-merge to disable the master IP. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit e87e5afb)
-
Andrea Spadaccini authored
lib/client/gnt_cluster.py: * Add activate-master-ip and deactivate-master-ip commands man/gnt-cluster.rst: * Document the new commands lib/opcodes.py lib/cmdlib.py * Add two opcodes and the LU that call the relevant RPCs test/docs_unittest.py * Silence an error about RAPI not implemented for the two new opcodes Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit fb926117) Conflicts: test/docs_unittest.py - kept devel-2.5 version, without the RAPI opcode checks
-
Andrea Spadaccini authored
lib/backend.py * split StartMaster() in ActivateMasterIp() and StartMasterDaemons() * split StopMaster() in DeactivateMasterIp() and StopMasterDaemons() lib/server/noded.py, lib/rpc.py * adapt the call chains to the new functions, define new RPCs lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py * use the new RPCs Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit fb460cf7)
-
Andrea Spadaccini authored
Use the gnt-cluster deactivate-master-ip command in cluster-merge to disable the master IP. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Andrea Spadaccini authored
lib/client/gnt_cluster.py: * Add activate-master-ip and deactivate-master-ip commands man/gnt-cluster.rst: * Document the new commands lib/opcodes.py lib/cmdlib.py * Add two opcodes and the LU that call the relevant RPCs test/docs_unittest.py * Silence an error about RAPI not implemented for the two new opcodes Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Andrea Spadaccini authored
lib/backend.py * split StartMaster() in ActivateMasterIp() and StartMasterDaemons() * split StopMaster() in DeactivateMasterIp() and StopMasterDaemons() lib/server/noded.py, lib/rpc.py * adapt the call chains to the new functions, define new RPCs lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py * use the new RPCs Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
If no arguments were specified the “exec_args” variable was “None”, leading to the command being run as “… ./… None”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
When “gnt-cluster copyfile” failed it would only print “Copy of file … to node … failed”. A detailed message is written using logging.error. Writing error messages to stderr can be helpful in figuring out what went wrong (the messages also go to the log file, but not everyone might know about it). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Also remove a bug note, since hbal can now for a long time directly execute jobs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 29, 2011
-
-
Andrea Spadaccini authored
Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
* hypervisor/hv_kvm.py - parse the memory transfer status * cmdlib.py - represent memory transfer info, if available Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
To add status reporting for the KVM migration, the instance_migrate RPC must be non-blocking. Moreover, there must be a way to represent the migration status and a way to fetch it. * constants.py: - add constants representing the migration statuses * objects.py: - add the MigrationStatus object * hypervisor/hv_base.py - change the FinalizeMigration method name to FinalizeMigrationDst - add the FinalizeMigrationSource method - add the GetMigrationStatus method * hypervisor/hv_kvm.py - change the implementation of MigrateInstance to be non-blocking (i.e. do not poll the status of the migration) - implement the new methods defined in BaseHypervisor * backend.py, server/noded.py, rpc.py - add methods to call the new hypervisor methods - fix documentation of the existing methods to reflect the changes * cmdlib.py - adapt the logic of TLMigrateInstance._ExecMigration to reflect the changes Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is very useful for testing/benchmarking. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Currently, the node pairs used for allocation are a simple [(primary, secondary)] list of tuples, as this is how they were used before the previous patch. However, for that patch, we use them separately per primary node, and we have to unpack this list right after generation. Therefore it makes sense to directly generate the list in the correct form, and remove the split from tryAlloc. This should not be slower than the previous patch, at least, possibly even faster. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This patch finally enables parallelisation in instance placement. My original try for enabling this didn't work well, but it took a while (and liberal use of threadscope) to understand why. The attempt was to simply `parMap rwhnf` over allocateOnPair, however this is not good as for a 100-node cluster, this will create roughly 100*100 sparks, which is way too much: each individual spark is too small, and there are too many sparks. Furthermore, the combining of the allocateOnPair results was done single-threaded, losing even more parallelism. So we had O(n²) sparks to run in parallel, each spark of size O(1), and we combine single-threadedly a list of O(n²) length. The new algorithm does a two-stage process: we group the list of valid pairs per primary node, relying on the fact that usually the secondary nodes are somewhat balanced (it's definitely true for 'blank' cluster computations). We then run in parallel over all primary nodes, doing both the individual allocateOnPair calls *and* the concatAllocs summarisation. This leaves only the summing of the primary group results together for the main execution thread. The new numbers are: O(n) sparks, each of size O(n), and we combine single-threadedly a list of O(n) length. This translates directly into a reasonable speedup (relative numbers for allocation of 3 instances on a 120-node cluster): - original code (non-threaded): 1.00 (baseline) - first attempt (2 threads): 0.81 (20% slowdown
‼️ ) - new code (non-threaded): 1.00 (no slowdown) - new code (threaded/1 thread): 1.00 - new code (2 threads): 1.65 (65% faster) We don't get a 2x speedup, because the GC time increases. Fortunately the code should scale well to more cores, so on many-core machines we should get a nice overall speedup. On a different machine with 4 cores, we get 3.29x. Signed-off-by:Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
This is moved outside of the concatAllocs as it will be needed in another place in the future. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
Iustin Pop authored
Originally, this data type was used both by instance allocation (1 result), and by instance relocation (many results, one per instance). As such, the field 'asSolutions' was a list, and the various code paths checked whether the length of the list matches the current mode. This is very ugly, as we can't guarantee this matching via the type system; hence the FIXME in the code. However, commit 6804faa0 removed the instance evacuation code, and thus we now always use just one allocation solution. Hence we can change the data type to a simply Maybe type, and get rid of many 'otherwise barf out' conditions. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 28, 2011
-
-
Andrea Spadaccini authored
* hv_kvm.py, hv_xen.py - return the hypervisor version (if available) from GetNodeInfo * cmdlib.py - if hypervisor version is available during the migration, and the versions differ, warn the user Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The change to enforce boolean results for cluster verify group opcode missed the HooksCallBack, which uses a very ugly 1/0 logic. Furthermore, the logic is wrong, since it unconditionally resets the verify result to true. The patch is changed to simply treat hook failures as failures, and do nothing for offline/nodes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
In the context of the lock monitor a “pending” item does not yet own the requested resource. Since these HTTP requests are already undergoing they should be shown as owners. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
With this change a node name instead of the IP address can be shown for pending RPC requests: Name Pending rpc/node18.example.com/test_delay thread:Jq1/Job692/TEST_DELAY Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Not all requests use an instance of RpcRunner yet and therefore won't show up (only instances have access to the global Ganeti context). Currently only the IP address is accessible. Another patch will add a nicer name for requests. Example output (gnt-debug locks -o name,pending): Name Pending rpc/192.0.2.18/test_delay thread:Jq12/Job683/TEST_DELAY Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This simplifies HttpClientPool.ProcessRequests significantly and will be handy for showing pending RPC requests in the lock monitor. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This reverts to the old behaviour in Ganeti 2.4 and before. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 27, 2011
-
-
Agata Murawska authored
Signed-off-by:
Agata Murawska <agatamurawska@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-