Commits · def8e2f66344751ed328c30db1e485b9e97d955b · itminedu / snf-ganeti

Nov 03, 2009

Michael Hanselmann authored 15 years ago


Also replaces a hardcoded limit of 15 seconds with 1/4
of NET_RECONFIG_TIMEOUT.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

def8e2f6

backend: Convert to utils.Retry · 3c0cdc83

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3c0cdc83

Add generic retry loop function · de0ea66b

Michael Hanselmann authored 15 years ago


There are quite a few retry loops with timeouts in Ganeti's
code. Duplicating code is not good, so this patch introduces
a new function named “utils.Retry” to remedy this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

de0ea66b

Nov 02, 2009

Some improvements to gnt-node repair-storage · 7e9c6a78

Iustin Pop authored 15 years ago


Currently the repair storage has two issues:

- down instances are aborting the operation, even though they should be
  ignored (it's not technically possible to know their disk status
  unless we would activate their disks)
- if the VG is so broken that disks cannot be activated via gnt-instance
  activate-disks or gnt-instance startup, it's not possible to repair
  the VG at all

The patch makes the opcode skip down instances and also introduces an
``--ignore-consistency`` flag for forcing the execution of the LU.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7e9c6a78

Convert the rest of the OpPrereqError users · debac808

Iustin Pop authored 15 years ago


This finishes the conversion of OpPrereqError creation to two-argument
style. Any leftovers as one-argument are not breaking anything, just
losing information about the errors.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

debac808

Add ecode to rpc.py's RpcResult.Raise() · 045dd6d9

Iustin Pop authored 15 years ago


This patch adds a new ecode argument to RpcResult.Raise(). This allows
specifying the error code (for both OpExec and OpPrereq errors).

Note that this patch also makes the OpExecError exceptions raised from
_FindFaultInstanceDisks have the error code classification.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

045dd6d9

Introduce two-argument style for OpPrereqError · 5c983ee5

Iustin Pop authored 15 years ago

This patch introduces a two-argument style for OpPrereqError. Only the
direct raise calls in cmdlib.py are converted, other users will follow.

cli.py is modified to handle both two-argument style and the current
format. RAPI doesn't need modification as the way we encode errors is
already using a list for the error arguments, so RAPI users only need to
start checking the list length and the second argument.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5c983ee5

Remove the OpRetryError exception · 159d4ec6

Iustin Pop authored 15 years ago


This is only used in two places, in an error path that is no longer
valid since Ganeti 2.0. We remove the try..except since we should not
get it anymore (and if we do, then we should catch it in all
config.Update cases) and we remove the exception class completely.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

159d4ec6

Activate disks while exporting an instance · 3e53a60b

Michael Hanselmann authored 15 years ago


Exporting an instance not running or without activated disks
will fail. This patch makes sure to activate disks before
exporting an instance if it's in the ADMIN_down state.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3e53a60b

Epydoc fixes · 23057d29

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

23057d29

backend: Don't overwrite function parameter with loop variable · ea79fc15
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
```
ea79fc15

Unify the query fields for the storage framework · 620a85fd

Iustin Pop authored 15 years ago


This patch unifies the query fields in the storage framework for all
types. Note that the information is still computed on-demand, so if e.g.
the used disk space is not requested for the ‘file’ type, it won't be
computed on nodes.

Summary of changes:
- improve the LVM storage type to support multiple lvm fields in the
  LIST_FIELDS declaration and constant (not-computed via lvm commands)
  fields
- rename utils.GetFilesystemFreeSpace to utils.GetFilesystemStats
  returning tuple of (total, free)
- add used and free as valid fields for lvm-vg (use being computed as
  vg_size-vg_free)
- make allocatable accepted for all types (ones which are always
  allocatable always return True)
- add a new list field ‘type’ that gives the current selected type; not
  much useful today (except for understanding what the default output
  is) but in the future might help if we want to list multiple types
- add type, size and allocatable to the default output field list
- update the man page with details on how, for file storage, size ≠ used
  + free for non-mountpoint cases

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

620a85fd

Oct 30, 2009

Make cluster initialization more reliable · 8f215968

Michael Hanselmann authored 15 years ago


There was a race condition between starting the node daemon
and sending requests to write the ssconf files. With this
patch, the initialization waits up to ten seconds for the
node daemon to become responsive.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

8f215968

Oct 29, 2009

Don't show warnings on ADMIN_down instance failover · 1df79ce6

Michael Hanselmann authored 15 years ago


Before:
$ gnt-instance failover -f inst1
… checking disk consistency between source and target
… - WARNING: Can't find disk on node node21.example.com
… shutting down instance on source node

After:
$ gnt-instance failover -f inst1
… not checking disk consistency as instance is not running
… shutting down instance on source node

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

1df79ce6

http.auth: Add new function to verify passwords · bf9bd8dd

Michael Hanselmann authored 15 years ago


This new function supports two schemes for passwords:
- Old-style cleartext passwords
- Hashed passwords according to RFC2617 (H(A1))

Schemes are differentiated by their prefix, a concept also
used in OpenLDAP. Cleartext passwords can no longer start
with an opening brace ("{") unless they're prefixed with
"{cleartext}" (case insensitive).

Currently there's no documentation for rapi_users at all.
It'll be in a consecutive patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bf9bd8dd

Oct 28, 2009

Fix another style issue · c6f1af07

Iustin Pop authored 15 years ago


For the Nth time, re-fix shadowing of outer-scope variable :)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c6f1af07

Fix an error handling case in TLReplaceDisks · 20eca47d

Iustin Pop authored 15 years ago


pylint is your friend, since the compiler doesn't exist.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

20eca47d

Oct 27, 2009

Provide feedback from redistributing configuration · a4eae71f

Michael Hanselmann authored 15 years ago


This is particularily useful for “gnt-cluster redist-conf”, but
also for all other cases where the configuration files are
rewritten on other nodes.

$ gnt-cluster redist-conf
… Copy of file /var/lib/ganeti/config.data to node … failed: Error while
executing backend function: [Errno 1] Operation not permitted
… Error while uploading ssconf files to node …: Error while executing backend
function: [Errno 1] Operation not permitted

$ gnt-node modify --offline no --force node3.example.com
… - WARNING: Not enough master candidates (desired 10, new value will be 4)
… Copy of file /var/lib/ganeti/config.data to node node8.example.com failed:
Error while executing backend function: [Errno 1] Operation not permitted
Modified node node3.example.com
 - offline -> True
 - master_candidate -> auto-demotion due to offline

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a4eae71f

Fix gnt-node evacuate w. iallocator · e9022531

Iustin Pop authored 15 years ago


Commit 2bb5c911 moved around and changed the _RunAllocator function in
the DiskReplace → TaskLet conversion, but in the process it changed the
relocate_from argument from a list of nodes to just the secondary node.
This breaks the protocol and current iallocator scripts.

This patch fixes that but also adds a local variable 'instance' since
it's not nice to write self.instance so many times.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e9022531

Oct 26, 2009

InstanceIpToNodePrimaryIpQuery: use a query dict · 19351457

Guido Trotter authored 15 years ago

In 95b487bb we changed InstanceIpToNodePrimaryIpQuery to be able to query
multiple instances at once. We also need to be able to query ips
belonging to a specific nic link, so what we do is:

1) Move the "query" argument to a dict, containing different fields
2) Explicit the "query for a single ip" or "query for a list" options.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

19351457

SimpleConfigReader: ips are partitioned by link · cd195419

Guido Trotter authored 15 years ago


We were already half-doing it, but this completes the process.

1) We don't maintain a list of ips or an ip->instance map
2) We add a new link,ip->instance map (link->ips list we had)
3) We add the link parameter to GetInstanceByIp (making it
   GetInstanceByLinkIp)
4) We change the GetInstanceByIp caller to pass None as link
   (thus for now using only the default link)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

cd195419

SimpleConfigReader: queries for default nicparams · 47a626b0

Guido Trotter authored 15 years ago


GetDefaultNicParams returns the default nic parameters.
GetDefaultNicLink returns the default nic link.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

47a626b0

Import errors in confd __init__ · 6855f043

Guido Trotter authored 15 years ago


It's used by some functions defined there.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

6855f043

Allow '@' in tag values · b5e5632e

Iustin Pop authored 15 years ago


This allows using an email address (as is) as part of a tag. The main
problem that could arise is when parsing tags from a shell script, but
(AFAIK) '@' is not a special character when used in values (happy to be
corrected if not true).

The patch also moves the re to be compiled at class init time, should
use less resources; in my tests it is fine to use a compiled re from
multiple threads.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b5e5632e

Oct 23, 2009

cmdlib._AssembleInstanceDisks: Fix case where variable wouldn't be set · d52ea991

Michael Hanselmann authored 15 years ago


The “result” variable may not be set and/or come from the previous loop.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d52ea991

Oct 22, 2009

KVM netscript: add static routes, with no suffix · 8866ec86

Guido Trotter authored 15 years ago

The /32 suffix is useless, since the kernel already assumes single-host,
if no suffix is specified. Moreover we prefer these routes to be
"static" so that routing daemons, if present, won't mess with them.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8866ec86

KVMHypervisor: configure v6 parameters on nic · e014f1d0

Guido Trotter authored 15 years ago


In routing mode we are tweaking a few parameters on the interface. With
this patch we'll tweak both the v4 and v6 ones.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e014f1d0

KVMHypervisor: implement instance policy routing · 2c5afffb

Guido Trotter authored 15 years ago

Until now we relied on traffic from instances being policy routed via a
rule based on the instance network. With this change we can enforce it
on the instance interfaces. Since the ip rules survive interface
disappearing and reappearing, we need first to remove leftover rules,
and then to apply the new one, when creating the interface.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2c5afffb

Adding '--no-ssh-init' option to 'gnt-cluster init'. · b989b9d9

Ken Wehr authored 15 years ago


Allows the initialization of a cluster without the creation or distribution
of SSH key pairs. Includes changes for LeaveCluster and RPC.

Signed-off-by: Ken Wehr <ksw@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b989b9d9

confd: query the pnode of multiple instances at once · 95b487bb

Flavio Silvestrow authored 15 years ago


Signed-off-by: Flavio Silvestrow <flaviops@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

95b487bb

Try to reduce wrong errors in InstanceShutdown · 3782acd7

Iustin Pop authored 15 years ago


In backend.InstanceShutdown(), there is a race condition between
checking that the instance exists and trying to shut it down which
translates sometime in error messages like:

Tue Oct 20 20:08:30 2009 - WARNING: Could not shutdown instance: Failed
to force stop instance instance9: Failed to stop instance instance9:
exited with exit code 1, Error: Domain 'instance9' does not exist.

To fix this, we ignore any hypervisor StopInstance() errors if the
instance doesn't exist anymore, since our purpose (to make the instance
go away) is already accomplished.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3782acd7

Revert breakage introduced in · 7734de0a

Iustin Pop authored 15 years ago


Commit e4e9b806 introduced two problems
in backend.InstanceShutdown():

- first, it reduced the check interval significantly (especially for the
  first few checks); there are very few production VMs that shutdown in
  one second, and while not breaking anything this creates unnecessary
  load for the hypervisor
- second, a wrong test added to the while condition (“not tried_once”)
  means that we only sleep once for an instance, and after that we
  immediately kill it forcefully

These two together means that any instance which is not lucky enough to
finish in roughly 1-1.5 seconds (the time it takes to sleep and verify
again the instance list) will have this happen:

2009-10-21 23:33:46,034:  pid=16634 INFO Called for inst9 w. False/False
2009-10-21 23:33:47,440:  pid=16634 ERROR Shutdown of 'inst9' unsuccessful, forcing
2009-10-21 23:33:47,440:  pid=16634 INFO Called for inst9 w. True/False

The “Called…” are logs from the hypervisor shutdown function. This means
of course that at restart time:

[12775866.644682] EXT3-fs: INFO: recovery required on readonly filesystem.
[12775866.644689] EXT3-fs: write access will be enabled during recovery.
[12775868.533674] kjournald starting.  Commit interval 5 seconds
[12775868.533697] EXT3-fs: sda1: orphan cleanup on readonly fs
[12775868.551797] EXT3-fs: sda1: 12 orphan inodes deleted
[12775868.551803] EXT3-fs: recovery complete.
[12775868.586275] EXT3-fs: mounted filesystem with ordered data mode.

This patch reverts the broken test and changes the sleep to a fixed
duration of five seconds, since it makes no sense to check that often
for shutdown (and after ~20 seconds we anyway reach a stable value of
five seconds).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7734de0a

Xen: Ignore the retry argument in stop instance · 0cf11e68

Iustin Pop authored 15 years ago


Commit 4ad45119 changed the KVM hypervisor to send multiple shutdown
requests to the monitor, but it didn't change this for the Xen
hypervisor. We simply remove the return on retry model, since we do want
to send multiple shutdown signals for both Xen and KVM (even if the
behaviour is not perfect, they should behave the same).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0cf11e68

Oct 21, 2009

Ensure RpcResult has “payload” attribute · 1645d22d

Michael Hanselmann authored 15 years ago


Also add assertions to avoid missing attributes in the future.
They won't be included in optimized bytecode.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

1645d22d

Oct 20, 2009

Introduce checks for /sys and /proc · 7c0aa8e9

Iustin Pop authored 15 years ago


This patch adds checks for /proc and /sys in cluster verify, since
Ganeti relies on these special filesystems to be mounted.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7c0aa8e9

Oct 19, 2009

Fix serializer unittests · d357f531

Michael Hanselmann authored 15 years ago


Commit d22b2999 broke the serializer unittests with certain
versions of simplejson. This patch removes sort_keys again
and implements a slightly more efficient way of detecting
simplejson functionality. The serializer unittests no longer
use a partially broken mock, but rather a function to convert all
tuples to lists before comparing.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d357f531

Oct 16, 2009

bootstrap: Factorize HMAC key generation · c008906b

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c008906b

Make bootstrap._GenerateSelfSignedSslCert public · cd34faf2
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
cd34faf2

serializer: Sort keys in JSON · d22b2999

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d22b2999

Oct 15, 2009

mcpu: Use new timeout class for timeout · a6db1af2

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a6db1af2