Commits · 9822b1dddd4d7f2eb54984fd7f213b2ab7e3112b · itminedu / snf-ganeti

Oct 04, 2011

Fix Makefile rules for QCHelper.hs · 9822b1dd

Iustin Pop authored 13 years ago


Include QCHelper.hs in the distributed files, and also exclude it and
the THH.hs file from coverage reports.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

9822b1dd

Oct 03, 2011

Some TH simplifications · 53664e15

Iustin Pop authored 13 years ago


Now that the basic code works, let's use some aliases for simpler code
and less ))))))))).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

53664e15

A few minor test improvements · 72bb6b4e

Iustin Pop authored 13 years ago


This patch adds a few niceties to the test suite:

- allows matching test groups case insensitive and emit warnings when
  we give test group names that don't match anything
- add a new operator that is similar to assertEqual in Python: it
  tests for equality and emits the two values in case of error

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

72bb6b4e

Use TemplateHaskell to decorate tests with names · 23fe06c2

Iustin Pop authored 13 years ago


This makes error message change from "Test 4 failed …" to "Test
prop_Loader_mergeData failed", which is much more readable. It also
removes the duplication of test suite names in the test.hs file.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

23fe06c2

Use TemplateHaskell to generate opcode serialisation · 12c19659

Iustin Pop authored 13 years ago


This replaces the hand-coded opcode serialisation code with
auto-generation based on TemplateHaskell.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

12c19659

Use TemplateHaskell to build the opID function · 6111e296

Iustin Pop authored 13 years ago


This replaces the hand-coded opID with one automatically generated
from the constructor names, similar to the way Python does it, except
it's done at compilation time as opposed to runtime.

Again, the code line delta does not favour this patch, but this
eliminates error-prone, manual code with auto-generated one; in case
we add more opcode support, this will help a lot.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

6111e296

Use TemplateHaskell instead of hand-coded instances · e9aaa3c6

Iustin Pop authored 13 years ago

This patch replaces the current hard-coded JSON instances (all alike,
just manual conversion to/from string) with auto-generated code based
on Template Haskell
(http://www.haskell.org/haskellwiki/Template_Haskell

).

The reduction in code line is not big, as the helper module is well
documented and thus overall we gain about 70 code lines; however, if
we ignore comments we're in good shape, and any future addition of
such data types will be much simpler and less error-prone.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

e9aaa3c6

Rename some helper functions for consistency · 2c9336a4

Iustin Pop authored 13 years ago


This changes the names for some helper functions so that future
patches are touching less unrelated code. The change replaces
shortened prefixes with the full type name.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

2c9336a4

Split part of Utils.hs into JSON.hs · f047f90f

Iustin Pop authored 13 years ago


Utils is a bit big, let's split the JSON stuff (not all of it) into a
separate module that doesn't have any other dependencies.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

f047f90f

Sep 30, 2011

Merge branch 'devel-2.5' · 3398bff1

Andrea Spadaccini authored 13 years ago


* devel-2.5:
  Use --yes to deactivate master ip in cluster merge
  Use deactivate-master-ip in cluster-merge
  Add gnt-cluster commands to toggle the master IP
  Split starting and stopping master IP and daemons
  listrunner: Don't pass arguments if there are none
  ssh: Quote strings in error message
  utils.log: Write error messages to stderr
  Add signal handling doc to hbal man page
  Migration: warn the user about hv version mismatch
  Fix handling of cluster verify hooks
  Redistribute the RAPI certificate
  QA: Add tests for instance start/stop via RAPI
  RAPI: Fix wrong check on instance shutdown
  baserlib: Accept empty body in FillOpcode

Conflicts:
	lib/backend.py
   - no real conflicts
	lib/constants.py
   - preserve both changes
	lib/rapi/rlib2.py
   - keep master
	lib/rpc.py
   - no real conflicts
	tools/cluster-merge
   - keep devel-2.5

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

3398bff1

Merge branch 'stable-2.5' into devel-2.5 · cea3abbd

Andrea Spadaccini authored 13 years ago


* stable-2.5:
  listrunner: Don't pass arguments if there are none
  ssh: Quote strings in error message
  utils.log: Write error messages to stderr
  Add signal handling doc to hbal man page
  Fix handling of cluster verify hooks
  Redistribute the RAPI certificate
  QA: Add tests for instance start/stop via RAPI
  RAPI: Fix wrong check on instance shutdown
  baserlib: Accept empty body in FillOpcode

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

cea3abbd

Use --yes to deactivate master ip in cluster merge · aeb24d97

Guido Trotter authored 13 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

aeb24d97

Use deactivate-master-ip in cluster-merge · a3fad332

Andrea Spadaccini authored 13 years ago


Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit e87e5afb)

a3fad332

Add gnt-cluster commands to toggle the master IP · fb44c6db

Andrea Spadaccini authored 13 years ago


lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb926117)

Conflicts:

	test/docs_unittest.py
	  - kept devel-2.5 version, without the RAPI opcode checks

fb44c6db

Split starting and stopping master IP and daemons · c06e0c83

Andrea Spadaccini authored 13 years ago


lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb460cf7)

c06e0c83

Use deactivate-master-ip in cluster-merge · e87e5afb

Andrea Spadaccini authored 13 years ago


Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e87e5afb

Add gnt-cluster commands to toggle the master IP · fb926117

Andrea Spadaccini authored 13 years ago


lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fb926117

Split starting and stopping master IP and daemons · fb460cf7

Andrea Spadaccini authored 13 years ago


lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fb460cf7

listrunner: Don't pass arguments if there are none · 0c009cc5

Michael Hanselmann authored 13 years ago


If no arguments were specified the “exec_args” variable was “None”,
leading to the command being run as “… ./… None”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

0c009cc5

ssh: Quote strings in error message · 9dc45ab1

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9dc45ab1

utils.log: Write error messages to stderr · 34aa8b7c

Michael Hanselmann authored 13 years ago


When “gnt-cluster copyfile” failed it would only print “Copy of file …
to node … failed”. A detailed message is written using logging.error.
Writing error messages to stderr can be helpful in figuring out what
went wrong (the messages also go to the log file, but not everyone might
know about it).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

34aa8b7c

Add signal handling doc to hbal man page · 2b634302

Iustin Pop authored 13 years ago


Also remove a bug note, since hbal can now for a long time directly
execute jobs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2b634302

Sep 29, 2011

Adapt non-KVM hypervisors to new migration RPCs · 60af751d

Andrea Spadaccini authored 13 years ago


Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

60af751d

Add memory transfer progress info to migration · 61643226

Andrea Spadaccini authored 13 years ago


* hypervisor/hv_kvm.py
  - parse the memory transfer status

* cmdlib.py
  - represent memory transfer info, if available

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

61643226

Make migration RPC non-blocking · 6a1434d7

Andrea Spadaccini authored 13 years ago


To add status reporting for the KVM migration, the instance_migrate RPC
must be non-blocking. Moreover, there must be a way to represent the
migration status and a way to fetch it.

* constants.py:
  - add constants representing the migration statuses

* objects.py:
  - add the MigrationStatus object

* hypervisor/hv_base.py
  - change the FinalizeMigration method name to FinalizeMigrationDst
  - add the FinalizeMigrationSource method
  - add the GetMigrationStatus method

* hypervisor/hv_kvm.py
  - change the implementation of MigrateInstance to be non-blocking
    (i.e. do not poll the status of the migration)
  - implement the new methods defined in BaseHypervisor

* backend.py, server/noded.py, rpc.py
  - add methods to call the new hypervisor methods
  - fix documentation of the existing methods to reflect the changes

* cmdlib.py
  - adapt the logic of TLMigrateInstance._ExecMigration to reflect
    the changes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6a1434d7

Move _TimeoutExpired to utils · f8326fca

Andrea Spadaccini authored 13 years ago


Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f8326fca

Add an allocation limit to hspace · b8a2c0ab

Iustin Pop authored 13 years ago


This is very useful for testing/benchmarking.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

b8a2c0ab

Small simplification in tryAlloc · 1bf6d813

Iustin Pop authored 13 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

1bf6d813

Change how node pairs are generated/used · b0631f10

Iustin Pop authored 13 years ago

Currently, the node pairs used for allocation are a simple [(primary,
secondary)] list of tuples, as this is how they were used before the
previous patch. However, for that patch, we use them separately per
primary node, and we have to unpack this list right after generation.

Therefore it makes sense to directly generate the list in the correct
form, and remove the split from tryAlloc. This should not be slower
than the previous patch, at least, possibly even faster.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

b0631f10

Parallelise instance allocation/capacity computation · f828f4aa

Iustin Pop authored 13 years ago

This patch finally enables parallelisation in instance placement.

My original try for enabling this didn't work well, but it took a
while (and liberal use of threadscope) to understand why. The attempt
was to simply `parMap rwhnf` over allocateOnPair, however this is not
good as for a 100-node cluster, this will create roughly 100*100
sparks, which is way too much: each individual spark is too small, and
there are too many sparks. Furthermore, the combining of the
allocateOnPair results was done single-threaded, losing even more
parallelism. So we had O(n²) sparks to run in parallel, each spark of
size O(1), and we combine single-threadedly a list of O(n²) length.

The new algorithm does a two-stage process: we group the list of valid
pairs per primary node, relying on the fact that usually the secondary
nodes are somewhat balanced (it's definitely true for 'blank' cluster
computations). We then run in parallel over all primary nodes, doing
both the individual allocateOnPair calls *and* the concatAllocs
summarisation. This leaves only the summing of the primary group
results together for the main execution thread. The new numbers are:
O(n) sparks, each of size O(n), and we combine single-threadedly a
list of O(n) length.

This translates directly into a reasonable speedup (relative numbers
for allocation of 3 instances on a 120-node cluster):

- original code (non-threaded): 1.00 (baseline)
- first attempt (2 threads):    0.81 (20% slowdown‼️

)
- new code (non-threaded):      1.00 (no slowdown)
- new code (threaded/1 thread): 1.00
- new code (2 threads):         1.65 (65% faster)

We don't get a 2x speedup, because the GC time increases. Fortunately
the code should scale well to more cores, so on many-core machines we
should get a nice overall speedup. On a different machine with 4
cores, we get 3.29x.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

f828f4aa

Abstract comparison of AllocElements · d7339c99

Iustin Pop authored 13 years ago


This is moved outside of the concatAllocs as it will be needed in
another place in the future.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

d7339c99

Change type of Cluster.AllocSolution · 129734d3

Iustin Pop authored 13 years ago


Originally, this data type was used both by instance allocation (1
result), and by instance relocation (many results, one per
instance). As such, the field 'asSolutions' was a list, and the
various code paths checked whether the length of the list matches the
current mode. This is very ugly, as we can't guarantee this matching
via the type system; hence the FIXME in the code.

However, commit 6804faa0 removed the instance evacuation code, and thus
we now always use just one allocation solution. Hence we can change
the data type to a simply Maybe type, and get rid of many 'otherwise
barf out' conditions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

129734d3

Sep 28, 2011

Migration: warn the user about hv version mismatch · 34fbc862

Andrea Spadaccini authored 13 years ago


* hv_kvm.py, hv_xen.py
  - return the hypervisor version (if available) from GetNodeInfo

* cmdlib.py
  - if hypervisor version is available during the migration, and the
    versions differ, warn the user

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

34fbc862

Fix handling of cluster verify hooks · 3656c889

Iustin Pop authored 13 years ago


The change to enforce boolean results for cluster verify group opcode
missed the HooksCallBack, which uses a very ugly 1/0
logic. Furthermore, the logic is wrong, since it unconditionally
resets the verify result to true.

The patch is changed to simply treat hook failures as failures, and do
nothing for offline/nodes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3656c889

http.client: Show pending requests as “owner” · 90b2eeb0

Michael Hanselmann authored 13 years ago


In the context of the lock monitor a “pending” item does not yet own the
requested resource. Since these HTTP requests are already undergoing
they should be shown as owners.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

90b2eeb0

http.client: Add nice name to requests · 7cb2d205

Michael Hanselmann authored 13 years ago


With this change a node name instead of the IP address can be shown for
pending RPC requests:
Name                              Pending
rpc/node18.example.com/test_delay thread:Jq1/Job692/TEST_DELAY

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7cb2d205

rpc/http: Show pending RPC requests in lock monitor · aea5caef

Michael Hanselmann authored 13 years ago


Not all requests use an instance of RpcRunner yet and therefore won't
show up (only instances have access to the global Ganeti context).
Currently only the IP address is accessible. Another patch will add a
nicer name for requests.

Example output (gnt-debug locks -o name,pending):
Name                      Pending
rpc/192.0.2.18/test_delay thread:Jq12/Job683/TEST_DELAY

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

aea5caef

http.client: Factorize code interacting with cURL · ecd61b4e

Michael Hanselmann authored 13 years ago


This simplifies HttpClientPool.ProcessRequests significantly and will be
handy for showing pending RPC requests in the lock monitor.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

ecd61b4e

Redistribute the RAPI certificate · 835f8b23

Iustin Pop authored 13 years ago


This reverts to the old behaviour in Ganeti 2.4 and before.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

835f8b23

Sep 27, 2011

Adding qemu-img dependency to INSTALL · 6567f1d9

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6567f1d9