Commits · 4e5a68f8081f00e37c6250f3ba6246399e89c413 · itminedu / snf-ganeti

Jan 29, 2009
- RAPI: rlib1 removal · 4e5a68f8
  Oleksiy Mishchenko authored 16 years ago
```
The resources we still need moved to rlib2.

Reviewed-by: iustinp
```
  4e5a68f8
- RAPI: Implement /2 resource · fc72a3a3
  Oleksiy Mishchenko authored 16 years ago
```
Reviewed-by: iustinp
```
  fc72a3a3
- RAPI: Deprecate version Rapi version1 · dc824c9f
  Oleksiy Mishchenko authored 16 years ago
```
It is impossible to keep backward compatibility due to
significant changes in the Ganeti core.

Reviewed-by: iustinp
```
  dc824c9f
Jan 28, 2009

Fix gnt-cluster modify -H and offline nodes · 68c6f21c
Iustin Pop authored 16 years ago
```
Reviewed-by: ultrotter
```
68c6f21c

Actually mark drives as read-only if so configured · d34b16d7

Iustin Pop authored 16 years ago

This patch correctly marks the drives as read-only for Xen, and raises
and exception for KVM since it doesn't support read-only drives.

Reviewed-by: ultrotter

d34b16d7

Fix some issues related to job cancelling · df0fb067

Iustin Pop authored 16 years ago

This patch fixes two issues with the cancel mechanism:
  - cancelled jobs show as such, and not in error state (we mark them as
    OP_STATUS_CANCELED and not OP_STATUS_ERROR)
  - queued jobs which are cancelled don't raise errors in the master (we
    treat OP_STATUS_CANCELED now)

Reviewed-by: imsnah

df0fb067

Jan 27, 2009

Xen: use utils.WriteFile for the instance configs · 73cd67f4
Guido Trotter authored 16 years ago
```
Also raise HypervisorError rather than OpExecError.

Reviewed-by: iustinp
```
73cd67f4
Xen: use utils.Readfile to read the VNC password · 78f66a17
Guido Trotter authored 16 years ago
```
Also raise HypervisorError rather than OpExecError.

Reviewed-by: iustinp
```
78f66a17

Implement disk verify checks in config verify · 332d0e37

Iustin Pop authored 16 years ago

This patch adds a simple check that the 'mode' attribute of top-level disks is
correct. It does not recurse over children.

The framework could be extended with other checks in the future.

Reviewed-by: imsnah

332d0e37

Fix the mode attribute of newly-created disks · 6ec66eae

Iustin Pop authored 16 years ago

Currently, only the LUSetInstanceParams correctly sets up the mode
attribute via a manual operation. We remove this and instead do the
correct setting in the generic _GenerateDiskTemplate function, so that
we set the mode correctly for all disk creations.

Reviewed-by: ultrotter

6ec66eae

Rework the multi-instance gnt commands · 479636a3

Iustin Pop authored 16 years ago

This patch changes the multi-instance gnt-* commands (gnt-instance
start/stop, gnt-node evacuate/failover) such that the individual
operations are submitted in parallel, ideally improving the speed of the
execution.

The patch does this by abstracting the job set functionality into a new
class in cli.py, that takes care of the job submit, job poll and error
handling.

Reviewed-by: ultrotter

479636a3

Fix single-job archiving (gnt-job archive) · 5278185a
Iustin Pop authored 16 years ago
```
This is a simply typo from the conversion to multi-job archiving.

Reviewed-by: imsnah
```
5278185a

KVM and Xen: add the HV_ROOT_PATH parameter · 074ca009

Guido Trotter authored 16 years ago

This parameter allows a different path to be passed to the instance
kernel. The new parameter is mandatory, and by default has the value of
the old hardcoded value for both kvm and xen.

Beta1 clusters will need to have this parameter added for their
instances to be able to boot.

Reviewed-by: iustinp

074ca009

KVM: implement GetShellCommandForConsole · 637ce7f9

Guido Trotter authored 16 years ago

This is a class method, because it calls _InstanceSerial, which is
another class method. The patch changes it to classmethod for all the
hypervisor classes.

Reviewed-by: iustinp

637ce7f9

KVM: classify _Instance{Monitor,Serial,KVMRuntime} · 0df4d98a

Guido Trotter authored 16 years ago

Those methods need nothing from the instantiated class, and just
manipulate strings, and fetch some class global variables, so they can
be classmethods.

Reviewed-by: iustinp

0df4d98a

Jan 26, 2009

Release 2.0 beta 1 · e33a0080

Iustin Pop authored 16 years ago

Even though alpha started at 0, we release beta 1 first as we did for
1.2.

Reviewed-by: imsnah, ultrotter

e33a0080

Update the NEWS documents for beta1 · 10f31783

Iustin Pop authored 16 years ago

Also import the NEWS entries from the 1.2 branch which were added since
we created it.

Reviewed-by: ultrotter

10f31783

Jan 23, 2009

Xen and KVM: correct a typo when checking args · 50cb2e2a

Guido Trotter authored 16 years ago

A missing 'be' was present in the error string for both xen and kvm,
when the kernel or initrd path was not absolute.

Reviewed-by: imsnah

50cb2e2a

Sort the instance names in batcher · 7312b33d

Iustin Pop authored 16 years ago

In case we submit multiple instances via batcher, it's nicer to have the
sorted nicely.

Reviewed-by: imsnah

7312b33d

Fix batcher for 2.0-style disks and nics · 9939547b

Iustin Pop authored 16 years ago

This patch fixes the gnt-instance batch-create command, and in doing so
also slightly changes two other functions:
  - we change utils.ParseUnit so that it accepts integer values also
    (both ParseUnit(5) and ParseUnit("5") return the same value)
  - a bridge 'None' in LUCreateInstance will be converted to the default
    bridge; currently only missing bridges will be accepted to mean the
    default one

The main changes to batcher were the change to variable number of disks
and NICs.

The patch also adds a batcher-instances.json example file copied from
the 1.2 branch and properly modified.

Reviewed-by: imsnah, killerfoxi

9939547b

Make iallocator work with offline nodes · 1325da74

Iustin Pop authored 16 years ago

This patch changes the iallocator framework to work with and properly
export to plugins offline nodes. It does this by only exporting the
static configuration data for those nodes, and not attempting to parse
the runtime data.

The patch also fixes bugs in iallocator related to the RpcResult
conversion, changes the should_run to admin_up attribute name (as per
the internals change), and adds “-I” as a short option for
“--iallocator” in gnt-instance, gnt-backup and burnin.

Reviewed-by: ultrotter

1325da74

Remove checking of DRBD metadata for validity · 3b559640

Iustin Pop authored 16 years ago

Currently the DRBD code checks that the metadata devices are valid
before creation, initial disk attachment and add children.

However, the process for checking validity requires a free DRBD minor,
and this conflict with parallel checking.

There are at least three possible solutions:
  - serialize all checks, which means we reduce parallelism and need
    extra locks
  - don't pass a valid minor number, but one like “/dev/drbd256” (which
    is invalid); this works for current version of DRBD, but since it's
    not guaranteed to remain so it doesn't look nice
  - don't do the checking at all, and rely on “drbdsetup ... disk ...”
    to fail by itself

The reason for checking metadata was that in 1.2, this was much cheaper
than trying to activate devices (and the subsequent iteration over the
minors). However, in 2.0, they have the same cost, so we can choose
option 3: just remove the explicit checking and rely on drbdsetup and
the kernel to fail.

Since DRBD8._InitMeta still requires a minor number, the two places
where this is run are handled as follows:
  - Create: we just use our own (unused currently) minor number
  - AddChildren: we keep using FindUnusedMinor, with the caveat that
    this function (used by replace-disks -n ...) cannot be yet
    parallelized

Reviewed-by: ultrotter

3b559640

Rework the execution model in burnin · c723c163

Iustin Pop authored 16 years ago

This patch changes (significantly) the execution model in burnin:
  - for all runs, (almost) all instance mods in a single Burn* procedure
    are done as part of a job; so for example add disk, stop, remove
    disk, start are no longer done as separate jobs but as a single job
    consisting of four opcodes
  - for parallel runs, all Burn* procedures except the rename (which
    uses a single target name) run in parallel; before, only the
    creation was done in parallel
  - due to the single-job execution and also parallel execution, the
    logging messages are no longer happening synchronously with the
    execution, so they are more informative than an actual execution log

The end result is that burnin now tests properly multi-opcode jobs and
also tests all opcodes (except rename) for parallel execution.

Note: On a test cluster, parallelization reduces burnin time from 23m to
15m.

Reviewed-by: ultrotter

c723c163

Relax the restrictions on temporary DRBD minors · 79b26a7a

Iustin Pop authored 16 years ago

Currently the restrictions are too harsh: there is a time interval
between an instance gets a new disk and before it is added to the
configuration in which the restriction is not met. We solve this by
allowing temporary DRBD minors to match existing minors (for the same
instance), such that parallel creations/minor allocations are OK.

The change is done by moving the add of temporary minors to the
minor map after the instance minors are computed, and only considering
them as duplicate if the instance name doesn't match.

Reviewed-by: ultrotter

79b26a7a

Introduce more configuration consistency checks · 4a89c54a

Iustin Pop authored 16 years ago

This patch enhances the duplicate DRBD minors checks (currently just a
few) and adds automatic checks of configuration consistency at
configuration file writing time.

In order to do so and show meaningful error messages, the
_UnlockedComputeDRBDMap function is changed to not raise errors in case
of duplicates, but instead return both the minors map and the duplicate
list, and its callers now raise the error. This allows the VerifyConfig
function to return a complete list of duplicates.

The new checks required some small updates to the unittests for the
config module.

Reviewed-by: ultrotter

4a89c54a

Fill the 'call' attribute of offline rpc results · 84b45587

Iustin Pop authored 16 years ago

When creating ‘fake’ results for offline nodes, we currently don't pass
the call attribute. This complicates debugging, so even though this
should not matter in practice, it's better to fix it.

Reviewed-by: imsnah

84b45587

A couple of small fixes to iallocator · 8901997e

Iustin Pop authored 16 years ago

This removes some constraints:
  - only two disks supported, this is no longer true as the underlying
    functions can now compute size for a variable number of disks
  - error when the hypervisor was not being passed
  - typo error

Reviewed-by: imsnah

8901997e

Jan 22, 2009

luxi: close and reopen the socket on errors · 8d5b316c

Iustin Pop authored 16 years ago

This is less of an actual issue for regular gnt-* clients, but it's
easily reproducible with burnin and possible with RAPI (depending on how
the program uses luxi.Client(s)).

In case of burnin, if we interrupt the client (^C) while it polls the
job, it will abort and raise an error. After that, burnin issues a
remove instance job, and at this point, we send the submit job (remove)
call but the first thing we read from the socket will be the response to
the previous poll job request, since that was queued already from the
master.

To solve this, whenever we detect an error in Transport.Call(), we close
that transport and re-create a new one, to start anew. The other
alternative would be to introduce a sequence to the protocol, but this
is something that would be design-level change and it's not recommended
at this stage.

Reviewed-by: imsnah

8d5b316c

Jan 21, 2009

ShutdownInstance: log instance name, not object · ca77edbc

Guido Trotter authored 16 years ago

When an instance fails to shut down we currently log its whole object,
rather than just the instance name.

Reviewed-by: iustinp

ca77edbc

KVM live migration: handle failure · c087266c

Guido Trotter authored 16 years ago

If the KVM live migration ends up in a 'failed' state it has been
aborted at the kvm level, and the machine is still running locally.
We support also the 'cancelled' state even though there should be no way
of reaching it, without manual intervention.

Reviewed-by: iustinp

c087266c

KVM: change a few IOError with EnvironmentError · 90c024f6
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
90c024f6

KVM: instance migration · 30e42c4e

Guido Trotter authored 16 years ago

The tcp port used for migrating KVM instances is selectable at
./configure time. We use a single port as nodes are locked anyway during
a migration, so no two migrations can happen at the same time to the
same node.

Reviewed-by: iustinp

30e42c4e

KVM: add the _InstancePidAlive function · 1f8b3a27

Guido Trotter authored 16 years ago

Throughout the kvm code we very often look for the instance pidfile
name, read it, and check if the process is alive. Abstract this into a
private function and use that one instead.

This patch also changes RebootInstance to check whether the instance is
alive before trying to reboot it.

Reviewed-by: iustinp

1f8b3a27

KVM: fix RebootInstance · f02881e0

Guido Trotter authored 16 years ago

RebootInstance was broken, because it just used to call StartInstance
with wrong parameters. With this patch we still stop the instance, but
use the saved kvm runtime to start it again.

Reviewed-by: iustinp

f02881e0

KVM: retry the instance shutdown command · 6567aff3

Guido Trotter authored 16 years ago

When we ask the instance to shutdown sometimes the command won't work,
especially if the instance isn't fully booted up. We'll wait for a bit,
and give it a few chances before giving up.

Reviewed-by: iustinp

6567aff3

Xen: implement auxiliary migration functions · 4390ccff

Guido Trotter authored 16 years ago

These are used, for the xen hypervisor, to copy the xen config file to
the remote node. This breaks migration for instances which have been
migrated, but not restarted, with the old code, for which the config
file was just lost.

Reviewed-by: iustinp

4390ccff

Automatically release DRBD minors on success · 61cf6b5e

Iustin Pop authored 16 years ago

This patch converts the DRBD minors reservation protocol from explicit
release to automatic release on the success paths. On the errors paths,
it's still needed to manual release.

The patch doesn't bring much by itself, but is needed for a future patch
which enhances the automatic verification of configuration consistency.

Reviewed-by: ultrotter

61cf6b5e

Fix some more pylint errors · c979d253

Iustin Pop authored 16 years ago

Two are real errors (invalid names) and one is style error (overriding
name from outer scope).

Reviewed-by: ultrotter

c979d253

One more gitignore rule · dc458d00

Iustin Pop authored 16 years ago

This was forgotten in the recent “switch to explicit ignore rules”.

Reviewed-by: imsnah

dc458d00

Log the rpc call name in the RPC errors message · 1b8acf70

Iustin Pop authored 16 years ago

Currently the rpc module logs the error description and target node in
rpc calls logging, as such:

  2009-01-21 00:50:01,456:  pid=1051/Thread-21 ERROR RPC error from node
    node1.example.com: Connection failed (111: Connection
    refused)

but this doesn't help to understand which call caused this (here it's an
offline node which should not be contacted at all).

This patch adds the logging of the call too, so cases like the above can
be debugged easier.

Reviewed-by: imsnah, ultrotter

1b8acf70