Commits · 340f4757caf256f0c0223b9a3634c042c08eae3c · itminedu / snf-ganeti

Jul 26, 2010

masterd: move the IP activation from Exec to Check · 340f4757

Iustin Pop authored 14 years ago


Currently, the master IP activation is done in the Exec function. Since
the original masterd process returns after forking, and Exec is run in
the (grand)child process, this means that after 'ganeti-masterd' has
returned there are still initialization tasks running.

Normally this is not a problem, but in cases where one does quick master
failovers, this creates a race condition which hits the QA scripts
especially hard.

To solve this, and make the startup process cleaner (the system is in
steady state after the command has returned, even though masterd startup
could still fail), we move the IP activation to Check(). This also
allows error messages about the IP activation to be seen on the console.

With this patch enabled, I can no longer reproduce the double-failover
errors, which were occuring before in 4/5 cases.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

340f4757

Move the UsesRPC decorator from cli to rpc · e0e916fe

Iustin Pop authored 14 years ago


This is needed because not just the cli scripts need this decorator, but
the master daemon too (and it already duplicated the code once).

In cli.py we just leave a stub, so that we don't have to modify all the
scripts to import rpc.py.

We then change the master daemon code to reuse this decorator, instead
of duplicating it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

e0e916fe

watcher: smarter handling of instance records · f5116c87

Iustin Pop authored 14 years ago

This patch implements a few changes to the instance handling. First, old
instances which no longer exist on the cluster are removed from the
state file, to keep things clean.

Second, the instance restart counters are reset every 8 hours, since
some error cases might be transient (e.g. networking issues, or machine
temporarily down), and if the problem takes more than 5 restarts but is
not permanent, watcher will not restart the instance. The value of 8
hours is, I think, both conservative (as not to hammer the cluster too
often with restarts) and fast enough to clear semi-transient problems.

And last, if an instance is not restarted due to exhausted retries, this
should be warned, otherwise it's hard to understand why watcher doesn't
want to restart an ERROR_down instance.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f5116c87

Jul 23, 2010

Update the RAPI node migrate for the 'live' change · 52194140

Iustin Pop authored 14 years ago


This patch adds handling of the new 'mode' parameter to the RAPI server,
while keeping compatibility with the old mode. Note that in the old mode
(when 'live' is being passed), the auto-mode doesn't work.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

52194140

Update the RAPI client for the migration mode · 1f334d96

Iustin Pop authored 14 years ago


See the discussion on the previous patch about this. Basically unless we
want to a add a new 'feature' marking for the live migration parameter,
there is no simple way to handle this nicely in the client.

Given that the client was/is marked as experimental, this patch simply
replaces live with mode. This means that this client won't work with 2.1
clusters…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

1f334d96

Fix burnin and live migration · f907fcf2

Iustin Pop authored 14 years ago


This is breakage from the original 'live' parameter changes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f907fcf2

Rename the OpMigrate* parameter 'live' to 'mode' · 8c35561f

Iustin Pop authored 14 years ago


This is needed as now the parameter is no longer boolean, but tri-state.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

8c35561f

Rename migration type to migration mode · 783a6c0b

Iustin Pop authored 14 years ago


This is in preparation for the rename of the opcode 'live' parameter to
'mode'.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

783a6c0b

utils: Fix incorrect docstring · 2ea65c7d

Manuel Franceschini authored 14 years ago


Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2ea65c7d

Jul 22, 2010

Merge branch 'devel-2.1' into master · 089e5e50

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

089e5e50

Fix issue when changing the disk template to drbd · 6e04dc39

Iustin Pop authored 14 years ago


If we pass the current primary node, the conversion will fail horribly
with LVM creation errors. Instead, we catch and check for this
condition in CheckPrereq.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6e04dc39

Jul 21, 2010

Remove a couple of empty design sections · a3427fe3

Guido Trotter authored 14 years ago

The 2.1 and 2.2 designs contain sections with no actual content, as they
are detailed for each single change. Removing the global empty ones.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

a3427fe3

Disable 'invalid name' pylint warning for tools/setup-ssh · c9a4a662
Manuel Franceschini authored 14 years ago
```
Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
```
c9a4a662

Always set commonName in X509 certificates · 600535f0

Manuel Franceschini authored 14 years ago


Due to the current switch of the RPC client to PycURL, a bug with newer
versions of libcurl surfaced. When the 'Subject' or 'Issuer' of
'server.pem' were empty, SSL handshake failed.

This patch changes the certificate generation functions such that they
always use "ganeti.example.com" as commonName (CN) for 'Subject' and
'Issuer'.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

600535f0

Jul 20, 2010

Adding tool to setup SSH on a remote host · 05cd934d

René Nussbaumer authored 14 years ago


This prepares the remote node to be joined into a cluster

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

05cd934d

Adding new (optional) dependency to configure.ac · a40b1fc4

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a40b1fc4

Adding constants for setup-ssh · 2089573e

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2089573e

Change AddAuthorizedKey to also allow filehandles · 3727671e

René Nussbaumer authored 14 years ago


This is required to use this function over paramiko
sftp file handles.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3727671e

Jul 19, 2010

Update .gitignore for vcs-version · 2a73861a

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2a73861a

RAPI client: Encode empty body to JSON · 8306e0e4

Michael Hanselmann authored 14 years ago


If the body consists of an empty dict, it should also be encoded.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8306e0e4

Introduce git reference/tag tracking for debugging · 84a12e40

Iustin Pop authored 14 years ago


This patch adds a new vcs-version file that is generated via git (and
can be adapted if VCS is changed) and then embebbded as VCS_VERSION in
the constants module.

This means two things:
- local modifications without committing to git (or when using a tar.gz
  archive + mods) will not be reflected
- version is fixed at the time of the last make regen-vcs-version (dist time,
  or devel/upload which calls this)

Thus this is more geared at developers rather than end users.

The patch:

- adds rules for generating the vcs-version file
- adds a dist-hook for re-generating the file (if possible) and copying
  the updated version to the distdir
- modifies devel/upload to re-generate the file before upload

The output of --version will look like:
gnt-cluster (ganeti v2.2.0beta0-184-gebca7e6) 2.2.0~beta0

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

84a12e40

Jul 16, 2010

Fix epydoc warning "Lists must be indented." · 0ad22aab

Luca Bigliardi authored 14 years ago


Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0ad22aab

Convert RPC client to PycURL · 33231500

Michael Hanselmann authored 14 years ago


Instead of using our custom HTTP client, using PycURL's multi
interface allows us to get rid of the HTTP client threadpool.
The majority of the code is still in the ganeti.http.client
module.

A simple per-thread HTTP client pool gives cURL a chance to
cache and retain as much information as possible (e.g. SSL certs).
Unused HTTP clients (e.g. due to removed nodes) are deleted after
25 requests going through the pool.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

33231500

Implement lock names for debugging purposes · 7f93570a

Iustin Pop authored 15 years ago


This patch adds lock names to SharedLocks and LockSets, that can be used
later for displaying the actual locks being held/used in places where we
only have the lock, and not the entire context of the locking operation.

Since I realized that the production code doesn't call LockSet with the
proper members= syntax, but directly as positional parameters, I've
converted this (and the arguments to GlobalLockManager) into positional
arguments.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7f93570a

Merge branch 'devel-2.1' · e0c9743d

Guido Trotter authored 14 years ago


* devel-2.1:
  Bump up version to release 2.1.6
  Update NEWS file for 2.1.6

Conflicts:
	NEWS
	  - merge
	configure.ac
	  - keep 2.2 version

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e0c9743d

Bump up version to release 2.1.6 · a1d8344b

Guido Trotter authored 14 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a1d8344b

Update NEWS file for 2.1.6 · ae828011

Guido Trotter authored 14 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ae828011

Fix pylint complaints introduced in commit · 067d927b

Michael Hanselmann authored 14 years ago


Due to a small mistake I missed three non-critical pylint complaints for
commit e58f87a9. They're fixed with this patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

067d927b

LXC: Add cpu_mask hypervisor parameter · e3ed5316

Balazs Lecz authored 14 years ago


Also implement syntax checking.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e3ed5316

Add ParseCpuMask() utility function · 31155d60

Balazs Lecz authored 14 years ago


Also adds a generic ParseError exception.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

31155d60

Add a migration type global hypervisor parameter · e71b9ef4

Iustin Pop authored 14 years ago


Since migration live/non-live is more stable (e.g.) for Xen-PVM versus
Xen-HVM, we introduce a new parameter for what mode we should use by
default (if not overridden by the user, in the opcode).

The meaning of the opcode 'live' field changes from boolean to either
None (use the hypervisor default), or one of the allowed migration
string constants. The live parameter of the TLMigrateInstance is still a
boolean, computed from the opcode field (which is no longer passed to
the TL).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e71b9ef4

Jul 15, 2010

Add test for some aspects of job queue · e58f87a9

Michael Hanselmann authored 14 years ago


This new opcode and gnt-debug sub-command test some aspects of the
job queue, including the status of a job. The bug fixed in commit
2034c70d was identified using this test. A future patch will
run this test automatically from the QA scripts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e58f87a9

LUVerifyCluster: update _ValidateNode description · 9dd6889b

Luca Bigliardi authored 14 years ago


Change _ValidateNode description to reflect what the function actually does.

Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9dd6889b

KVM hypervisor: Use utils.ShellWriter for network script · 748e4b5a

Michael Hanselmann authored 14 years ago


This patch converts hv_kvm to use utils.ShellWriter for writing
the network script. It also adds a few unittests (the first
for any hypervisor modules).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

748e4b5a

Move ShellWriter class to utils · 858905fb

Michael Hanselmann authored 14 years ago


Also add unittest.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

858905fb

Rename test for utils.IgnoreProcessNotFound · 9dc71d5a

Michael Hanselmann authored 14 years ago


Usually our tests are named “Test…”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Manuel Franceschini <livewire@google.com>

9dc71d5a

jqueue: Factorize code waiting for job changes · 989a8bee

Michael Hanselmann authored 14 years ago


By splitting the _WaitForJobChangesHelper class into multiple smaller
classes, we gain in several places:

- Simpler code, less interaction between functions and variables
- Easy to unittest (close to 100% coverage)
- Waiting for job changes has no direct knowledge of queue anymore (it
  doesn't references queue functions anymore, especially not private ones)
- Activate inotify only if there was no change at the beginning (and
  checking again right away to avoid race conditions)

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

989a8bee

Jul 13, 2010

Merge remote branch 'origin/devel-2.1' · 54dc4fdb

Michael Hanselmann authored 14 years ago


* origin/devel-2.1:
  RAPI client: Implement old instance creation request format
  rlib2: Use constants for disk and NIC parameters

Conflicts:
	test/ganeti.rapi.client_unittest.py: Trivial
	test/ganeti.rapi.rlib2_unittest.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

54dc4fdb

RAPI client: Implement old instance creation request format · 48436b97

Michael Hanselmann authored 14 years ago


Commit 8a47b447 implemented instance creation in the RAPI client,
but it left out support for the old instance creation request format.
This patch now implements the old format as good as possible. This
will only be used when talking to clusters before Ganeti 2.1.3.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

48436b97

rlib2: Use constants for disk and NIC parameters · 7be048f0

Michael Hanselmann authored 14 years ago


These constants were added in commit bd061c35, but the parsing code
was not updated. This also fixes a bug where a NIC's MAC address
wasn't used.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7be048f0