Commits · 980d133064ca978e54e6b8ece7ed0b31869daa83 · itminedu / snf-ganeti

Oct 05, 2011

Demote to warnings the errors in --ignore-errors · 980d1330


Treat the gnt-cluster verify errors identified by the error codes in
--ignore-errors as warnings; just print a warning message for the user.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

980d1330

Add --ignore-errors parameter to cluster verify · 93f2399e

Andrea Spadaccini authored 13 years ago


lib/cli.py
- add IGNORE_ERROR_OPT;

client/gnt_cluster.py
- pass the ignore_errors parameter to the opcodes

lib/opcode.py
- update OpClusterVerifyConfig, OpClusterVerify and OpClusterVerifyGroup
  to accept the ignore_errors parameter

lib/cmdlib.py
- pass the ignore_errors parameter to the opcodes that need it

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

93f2399e

Move cluster verify error codes to constants · eedf99b5

Andrea Spadaccini authored 13 years ago


- move the cluster verify error codes from cmdlib._VerifyErrors to
  constants;
- add to each of them the CV (Cluster Verify) prefix;
- add the CV_ALL_ECODES and CV_ALL_ECODES_STRINGS constants;
- wrap the lines that exceed 80 characters after changing the error
  code names to the new ones.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

eedf99b5

Restore backend.GetMasterInfo return values order · 909b3a0e

Andrea Spadaccini authored 13 years ago


Change 5a8648eb changed the order of the
return values of backend.GetMasterInfo(). This broke the users of the
master_info RPC.

This change restores the original order, and adds a comment in
bootstrap.py about the new value added to the return values of
master_info.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

909b3a0e

Add cluster netmask parameter · 5a8648eb

Andrea Spadaccini authored 13 years ago


Add the master_netmask cluster parameter, that represents the netmask of
the master IP, encoded as a CIDR suffix.

This parameter can be set via the --master-netmask of gnt-cluster init
and gnt-cluster modify. The default behaviour is to be consistent with
the old default (/32 for IPv4 and /128 for IPv6).

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5a8648eb

Add ValidateNetmask and GetClass IPAddress methods · 7df2c4f0

Andrea Spadaccini authored 13 years ago


Add the following methods to netutils.IPAddress:
* ValidateNetmask
* GetClassFromIpVersion
* GetClassFromIpFamily

Also, add related tests to the test suite.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7df2c4f0

Oct 04, 2011

Fix issue when verifying cluster files · 170b02b7

Michael Hanselmann authored 13 years ago


If a cluster has any non-master-candidate nodes, those don't contain all
files (e.g. config.data). With commit aef59ae7 (March 31st, 2011)
the logic was changed and subsequently verifying a cluster with non-mc
nodes would complain.

This patch fixes this issue by changing the algorithm. It also adds an
additional check for files which shouldn't exist on a machine. A newly
added unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

170b02b7

Oct 03, 2011

Revert "utils.log: Write error messages to stderr" · d728ac75

Michael Hanselmann authored 13 years ago


This reverts commit 34aa8b7c. Writing
error messages to stderr would also include backtraces, something we
tried to avoid in the past.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d728ac75

Fix adding nodes after commit · ca6b16e5

Michael Hanselmann authored 13 years ago


Commit 64c7b383 changed the RPC call for verifying SSH connections.
Unfortunately this case in adding nodes was missed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ca6b16e5

Sep 30, 2011

LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383

Michael Hanselmann authored 13 years ago


When verifying a group the code would always check SSH to all nodes in
the same group, as well as the first node for every other group. On big
clusters this can cause issues since many nodes will try to connect to
the first node of another group at the same time. This patch changes the
algorithm to choose a different node every time.

A unittest for the selection algorithm is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

64c7b383

Optimise cli.JobExecutor with many pending jobs · 11705e3d

Iustin Pop authored 13 years ago


In the case we submit many pending jobs (> 100) to the masterd, the
JobExecutor 'spams' the master daemon with status requests for the
status of all the jobs, even though in the end it will only choose a
single job for polling.

This is very sub-optimal, because when the master is busy processing
small/fast jobs, this query forces reading all the jobs from
this. Restricting the 'window' of jobs that we query from the entire
set to a smaller subset makes a huge difference (masterd only, 0s
delay jobs, all jobs to tmpfs thus no I/O involved):

- submitting/waiting for 500 jobs:
  - before: ~21 s
  - after:   ~5 s
- submitting/waiting for 1K jobs:
  - before: ~76 s
  - after:   ~8 s

This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
8s to 12s. I think that choosing the 'best' job for nice output only
matters with a small number of jobs, and that for more than that
people will not actually watch the jobs. So changing from 'perfect
job' to 'best job in the first 25' should be OK.

Note that most jobs won't execute as fast as 0 delay, but this is
still a good improvement.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

11705e3d

Add gnt-cluster commands to toggle the master IP · fb44c6db

Andrea Spadaccini authored 13 years ago


lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb926117)

Conflicts:

	test/docs_unittest.py
	  - kept devel-2.5 version, without the RAPI opcode checks

fb44c6db

Split starting and stopping master IP and daemons · c06e0c83

Andrea Spadaccini authored 13 years ago


lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb460cf7)

c06e0c83

Add gnt-cluster commands to toggle the master IP · fb926117

Andrea Spadaccini authored 13 years ago


lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fb926117

Split starting and stopping master IP and daemons · fb460cf7

Andrea Spadaccini authored 13 years ago


lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fb460cf7

ssh: Quote strings in error message · 9dc45ab1

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9dc45ab1

utils.log: Write error messages to stderr · 34aa8b7c

Michael Hanselmann authored 13 years ago


When “gnt-cluster copyfile” failed it would only print “Copy of file …
to node … failed”. A detailed message is written using logging.error.
Writing error messages to stderr can be helpful in figuring out what
went wrong (the messages also go to the log file, but not everyone might
know about it).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

34aa8b7c

Sep 29, 2011

Adapt non-KVM hypervisors to new migration RPCs · 60af751d

Andrea Spadaccini authored 13 years ago


Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

60af751d

Add memory transfer progress info to migration · 61643226

Andrea Spadaccini authored 13 years ago


* hypervisor/hv_kvm.py
  - parse the memory transfer status

* cmdlib.py
  - represent memory transfer info, if available

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

61643226

Make migration RPC non-blocking · 6a1434d7

Andrea Spadaccini authored 13 years ago


To add status reporting for the KVM migration, the instance_migrate RPC
must be non-blocking. Moreover, there must be a way to represent the
migration status and a way to fetch it.

* constants.py:
  - add constants representing the migration statuses

* objects.py:
  - add the MigrationStatus object

* hypervisor/hv_base.py
  - change the FinalizeMigration method name to FinalizeMigrationDst
  - add the FinalizeMigrationSource method
  - add the GetMigrationStatus method

* hypervisor/hv_kvm.py
  - change the implementation of MigrateInstance to be non-blocking
    (i.e. do not poll the status of the migration)
  - implement the new methods defined in BaseHypervisor

* backend.py, server/noded.py, rpc.py
  - add methods to call the new hypervisor methods
  - fix documentation of the existing methods to reflect the changes

* cmdlib.py
  - adapt the logic of TLMigrateInstance._ExecMigration to reflect
    the changes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6a1434d7

Move _TimeoutExpired to utils · f8326fca

Andrea Spadaccini authored 13 years ago


Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f8326fca

Sep 28, 2011

Migration: warn the user about hv version mismatch · 34fbc862

Andrea Spadaccini authored 13 years ago


* hv_kvm.py, hv_xen.py
  - return the hypervisor version (if available) from GetNodeInfo

* cmdlib.py
  - if hypervisor version is available during the migration, and the
    versions differ, warn the user

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

34fbc862

Fix handling of cluster verify hooks · 3656c889

Iustin Pop authored 13 years ago


The change to enforce boolean results for cluster verify group opcode
missed the HooksCallBack, which uses a very ugly 1/0
logic. Furthermore, the logic is wrong, since it unconditionally
resets the verify result to true.

The patch is changed to simply treat hook failures as failures, and do
nothing for offline/nodes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3656c889

http.client: Show pending requests as “owner” · 90b2eeb0

Michael Hanselmann authored 13 years ago


In the context of the lock monitor a “pending” item does not yet own the
requested resource. Since these HTTP requests are already undergoing
they should be shown as owners.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

90b2eeb0

http.client: Add nice name to requests · 7cb2d205

Michael Hanselmann authored 13 years ago


With this change a node name instead of the IP address can be shown for
pending RPC requests:
Name                              Pending
rpc/node18.example.com/test_delay thread:Jq1/Job692/TEST_DELAY

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7cb2d205

rpc/http: Show pending RPC requests in lock monitor · aea5caef

Michael Hanselmann authored 13 years ago


Not all requests use an instance of RpcRunner yet and therefore won't
show up (only instances have access to the global Ganeti context).
Currently only the IP address is accessible. Another patch will add a
nicer name for requests.

Example output (gnt-debug locks -o name,pending):
Name                      Pending
rpc/192.0.2.18/test_delay thread:Jq12/Job683/TEST_DELAY

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

aea5caef

http.client: Factorize code interacting with cURL · ecd61b4e

Michael Hanselmann authored 13 years ago


This simplifies HttpClientPool.ProcessRequests significantly and will be
handy for showing pending RPC requests in the lock monitor.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

ecd61b4e

Redistribute the RAPI certificate · 835f8b23

Iustin Pop authored 13 years ago


This reverts to the old behaviour in Ganeti 2.4 and before.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

835f8b23

Sep 27, 2011

http.client: Reduce performance impact by assertion · a3c10d31

Michael Hanselmann authored 13 years ago


Call dict.values once instead of N times.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a3c10d31

rpc: Overhaul client structure · 00267bfe

Michael Hanselmann authored 13 years ago


- Clearly separate node name to IP address resolution into separate
  functions
- Simplified code structure (one code path instead of several)
- Fully unittested
- Preparation for more RPC improvements

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

00267bfe

rpc: Make compression function module-global · 30474135

Michael Hanselmann authored 13 years ago


No need to keep it in the class.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

30474135

Keep only one global RPC runner in Ganeti context · 87b3cb26

Michael Hanselmann authored 13 years ago


Instead of having one RPC runner per mcpu processor this will keep only
one instance as part of the masterd-wide Ganeti context. Upcoming
patches will change the RPC runner to report pending requests to the
lock manager.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

87b3cb26

Sep 26, 2011

TemporaryFilesManager implementation · 0c1a5b1e

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0c1a5b1e

Export: saving data to ovf file · 7432d332

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7432d332

Export: parsing data from config file · b179ce72

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b179ce72

Export: initial commit - manifest, ova creation etc · 0963b26a

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0963b26a

Import: backend, hypervisor and os · 7bde29b5

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7bde29b5

Import: networks · 24b9469d

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

24b9469d

Import: disk conversion · 99381e3b

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

99381e3b

Import: reading ovf file · 864cf6bf

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

864cf6bf