Commits · 95b487bb4cfb5c69600dbbe1de52acacd5b39568 · itminedu / snf-ganeti

Oct 22, 2009

confd: query the pnode of multiple instances at once · 95b487bb


Signed-off-by: Flavio Silvestrow <flaviops@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

95b487bb

Try to reduce wrong errors in InstanceShutdown · 3782acd7

Iustin Pop authored 15 years ago


In backend.InstanceShutdown(), there is a race condition between
checking that the instance exists and trying to shut it down which
translates sometime in error messages like:

Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed
to force stop instance instance9: Failed to stop instance instance9:
exited with exit code 1, Error: Domain 'instance9' does not exist.

To fix this, we ignore any hypervisor StopInstance() errors if the
instance doesn't exist anymore, since our purpose (to make the instance
go away) is already accomplished.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3782acd7

Revert breakage introduced in · 7734de0a

Iustin Pop authored 15 years ago


Commit e4e9b806 introduced two problems
in backend.InstanceShutdown():

- first, it reduced the check interval significantly (especially for the
  first few checks); there are very few production VMs that shutdown in
  one second, and while not breaking anything this creates unnecessary
  load for the hypervisor
- second, a wrong test added to the while condition (“not tried_once”)
  means that we only sleep once for an instance, and after that we
  immediately kill it forcefully

These two together means that any instance which is not lucky enough to
finish in roughly 1-1.5 seconds (the time it takes to sleep and verify
again the instance list) will have this happen:

2009-10-21 23:33:46,034:  pid=16634 INFO Called for inst9 w. False/False
2009-10-21 23:33:47,440:  pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing
2009-10-21 23:33:47,440:  pid=16634 INFO Called for inst9 w. True/False

The “Called…” are logs from the hypervisor shutdown function. This means
of course that at restart time:

[12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem.
[12775866.644689] EXT3-fs: write access will be enabled during recovery.
[12775868.533674] kjournald starting.  Commit interval 5 seconds
[12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs
[12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted
[12775868.551803] EXT3-fs: recovery complete.
[12775868.586275] EXT3-fs: mounted filesystem with ordered data mode.

This patch reverts the broken test and changes the sleep to a fixed
duration of five seconds, since it makes no sense to check that often
for shutdown (and after ~20 seconds we anyway reach a stable value of
five seconds).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7734de0a

Xen: Ignore the retry argument in stop instance · 0cf11e68

Iustin Pop authored 15 years ago


Commit 4ad45119 changed the KVM hypervisor to send multiple shutdown
requests to the monitor, but it didn't change this for the Xen
hypervisor. We simply remove the return on retry model, since we do want
to send multiple shutdown signals for both Xen and KVM (even if the
behaviour is not perfect, they should behave the same).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0cf11e68

Oct 21, 2009

Ensure RpcResult has “payload” attribute · 1645d22d

Michael Hanselmann authored 15 years ago


Also add assertions to avoid missing attributes in the future.
They won't be included in optimized bytecode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

1645d22d

Oct 20, 2009

Fix typo in install.rst · aeaa2ea2

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

aeaa2ea2

install.rst: mention xen config for live migration · 8ab90d80

Guido Trotter authored 15 years ago


This addresses issue 75.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8ab90d80

Bump version to 2.1.0~beta2 · 62066d05

Michael Hanselmann authored 15 years ago


I forgot to bump the configure.ac version before tagging the 2.1.0~beta1
release. Since we cannot remove old tags (see “On Re-tagging” in git-tag(1)),
we have to call this release 2.1.0~beta2.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

62066d05

Introduce checks for /sys and /proc · 7c0aa8e9

Iustin Pop authored 15 years ago


This patch adds checks for /proc and /sys in cluster verify, since
Ganeti relies on these special filesystems to be mounted.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7c0aa8e9

Oct 19, 2009

Fix serializer unittests · d357f531

Michael Hanselmann authored 15 years ago


Commit d22b2999 broke the serializer unittests with certain
versions of simplejson. This patch removes sort_keys again
and implements a slightly more efficient way of detecting
simplejson functionality. The serializer unittests no longer
use a partially broken mock, but rather a function to convert all
tuples to lists before comparing.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d357f531

Oct 16, 2009

cfgupgrade: Implement upgrade to 2.1.0 · aeb0c953

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

aeb0c953

bootstrap: Factorize HMAC key generation · c008906b

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c008906b

Make bootstrap._GenerateSelfSignedSslCert public · cd34faf2
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
cd34faf2

cfgupgrade: Remove Ganeti 1.2 support · 11c31f5c

Michael Hanselmann authored 15 years ago


This also fixes a few typos.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

11c31f5c

serializer: Sort keys in JSON · d22b2999

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d22b2999

Oct 15, 2009

Bump version to 2.1.0~beta0 · b03ee906

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b03ee906

mcpu: Use new timeout class for timeout · a6db1af2

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a6db1af2

locking: Convert pipe condition to new timeout class · f4e673fb
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
f4e673fb

locking.LockSet: Move timeout calculation to separate class · 7e8841bd

Michael Hanselmann authored 15 years ago


This class can also be used by mcpu.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7e8841bd

locking, mcpu: Ensure timeout is always >= 0.0 · b6b87034

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b6b87034

Oct 13, 2009

locking.LockSet: Improve assertions · e4335b5b

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e4335b5b

locking: Factorize LockSet.acquire · 76e2f08a

Michael Hanselmann authored 15 years ago


By moving the main code of LockSet.acquire to its own function
we reduce the code complexity a bit and clarify the exception
handling.

This also fixes a case where a lock acquire timeout wasn't
handled correctly, leading to obscure error messages.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

76e2f08a

mcpu: Make sure added locks are released on errors · 6f14fc27

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6f14fc27

Test LockSet.acquire return value for timeout · 23683c26

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

23683c26

opcodes: Add missing shutdown_timeout to OpRemoveInstance · fc1baca9
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
```
fc1baca9
luxi: Pass socket path directly to exception, not in tuple · 63d96e4c
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
```
63d96e4c

gnt-* use the correct opcode slot to build opcodes · 4d98c565

Guido Trotter authored 15 years ago


gnt-* scripts were building wrong opcodes for commands which had the
shutdown_timeout slot (due to missing testing after renaming). Fixing.

Also change SHUTDOWN_TIMEOUT_OPT dest field name to "shutdown_timeout":
it was set to "timeout". It would still work that way, but possibly be
confusing.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4d98c565

Update NEWS for instance shutdown timeout · f940cf61

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f940cf61

Update documentation for recreate-disks · cc291012

Iustin Pop authored 15 years ago


This also clarifies the UUIDs NEWS entry.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

cc291012

rapi: fix tag operations · 64246438

Iustin Pop authored 15 years ago


This patch fixes the tag PUT/DELETE operations, and additionally changes
the _Tags_* functions to take only positional and not keyword arguments
(the defaults do not make any sense at all, and they are always called
with all arguments).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

64246438

Update NEWS for Ganeti 2.1 · 920a91bf

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

920a91bf

Convert NEWS to ASCII · aa287e8c

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

aa287e8c

Update manpages for --shutdown-timeout · 1e2c9fd3

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1e2c9fd3

Add timeout options to other LUs · 17c3f802

Guido Trotter authored 15 years ago


All the LUs that shut down the instance need to be able too pass the
timeout parameter as well.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

17c3f802

cli: add SHUTDOWN_TIMEOUT_OPT · 7e5eaaa8

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7e5eaaa8

Oct 12, 2009

mcpu: Change lock attempt timeout calculation · e3200b18

Michael Hanselmann authored 15 years ago


With this patch all timeouts are pre-calculated. The interface of
the _LockTimeoutStrategy class is also changed a bit; NextAttempt
now returns a new instance.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e3200b18

Code and docstring style fixes · 69b99987

Michael Hanselmann authored 15 years ago


Found using pylint and epydoc.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

69b99987

mcpu: Improve lock reporting with timeouts · 211b6132

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

211b6132

mcpu: Implement lock timeouts · 407339d0

Michael Hanselmann authored 15 years ago


The timeout is always between ~0.1 and ~10.0 seconds. A small
variation of ±5% is added to prevent different jobs from
fighting each other. After 10 attempts to acquire the locks with
a timeout, a blocking acquire is made.

Lock status reporting will be improved in a separate patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

407339d0

mcpu: Remove unused exclusive_BGL attribute · 6b95b76d

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6b95b76d