Commits · 1b8acf70456331625cd284eafbec50f22c4f9229 · itminedu / snf-ganeti

Jan 21, 2009

Log the rpc call name in the RPC errors message · 1b8acf70

Iustin Pop authored 16 years ago

Currently the rpc module logs the error description and target node in
rpc calls logging, as such:

  2009-01-21 00:50:01,456:  pid=1051/Thread-21 ERROR RPC error from node
    node1.example.com: Connection failed (111: Connection
    refused)

but this doesn't help to understand which call caused this (here it's an
offline node which should not be contacted at all).

This patch adds the logging of the call too, so cases like the above can
be debugged easier.

Reviewed-by: imsnah, ultrotter

1b8acf70

Change the instance status attribute to boolean · 0d68c45d

Iustin Pop authored 16 years ago

Due to historic reasons, the “should run or not” attribute of an
instance was denoted by its “status” attribute having a string value of
either ‘up’ or ‘down’. Checking this is in code was done via hardcoding
of the strings.

This was long done for a redo, and this patch changes this attribute to
“admin_up” having a boolean value. The patch is in fact shorter than I
expected, and passes burnin.

The patch also fixes an error in BuildInstanceHookEnvByObject where the
instance.os was passed as the status value.

Reviewed-by: ultrotter

0d68c45d

Implement the new live migration backend functions · cd42d0ad

Guido Trotter authored 16 years ago

MigrationInfo, AcceptInstance and AbortMigration are implemented as
hypervisor specific functions, and by default they do nothing (as
they're not always necessary).

This patch also converts hv_base.MigrateInstance docstring to epydoc,
adds a missing @type to the GetInstanceInfo docstring, and removes an
unneeded empty line.

Reviewed-by: iustinp

cd42d0ad

KVM: save and remove the KVM runtime · 38e250ba

Guido Trotter authored 16 years ago

At instance startup time we save the kvm runtime, and at stop time we
delete it. This patch also includes a function to load the kvm runtime,
which is unused yet.

Reviewed-by: iustinp

38e250ba

KVM: split KVM runtime generation and startup · ee5f20b0

Guido Trotter authored 16 years ago

Before we used to generate the kvm command line and then just run it.
With this patch we split the generation from the time it is run,
allowing us to save it and replay it at reboot.

We must take special care about instance nics:
  - We can't include them in the saved command line, as they point to
    temporary files
  - We can't just generate them at exec time, because we would apply
    those changes, but not all the other ones, to a running instance,
    thus making it inconsistent (for example if an instance had a memory
    increased and one more nic, in a soft reboot we would add the nic, but
    not the memory)
So we'll just save the instance nic data at the time the kvm runtime
data is generated, and transform it into actual parameters at execution
time.

Reviewed-by: iustinp

ee5f20b0

Add calls in the intra-node migration protocol · 6906a9d8

Guido Trotter authored 16 years ago

Currently the hypervisor is expected to do all the migration from the
source side. With this patch we also add the option of passing some
information to the target side, and starting some operation there.

As a bonus, a function to cleanup any started operation is included.

Reviewed-by: iustinp

6906a9d8

Update the objects.Disk formatting method · 89f28b76
Iustin Pop authored 16 years ago
```
With the addition of minors, this needs to show them too.

Reviewed-by: ultrotter
```
89f28b76

Jan 20, 2009

KVM: add a _CONF_DIR · a1d79fc6

Guido Trotter authored 16 years ago

Currently we keep pid files and control files. In the conf dir we'll
also keep the data to start the instance anew, and the network
interface scripts. These will then be copied to a separate area (since
_CONF_DIR could be mounted 'noexec') and used to start the instance.

This patch also adds comments to state what the various directories are
used for.

Reviewed-by: iustinp

a1d79fc6

KVM: Remove sockets after shutdown · c4fbefc8

Guido Trotter authored 16 years ago

Abstract the monitor and serial socket naming in two functions, and
reuse them to cleanup the files after shutdown.

Reviewed-by: iustinp

c4fbefc8

KVM: fix class docstring · c4469f75
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
c4469f75
Xen: use epydoc in MigrateInstance docstring · fdf7f055
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
fdf7f055

ShutdownInstance: report hypervisor error · 920aae98

Guido Trotter authored 16 years ago

When StopInstance raises an HypervisorError, report it in the logged
message to ease with debugging.

Reviewed-by: iustinp

920aae98

ConfigObject docstring, close an open parenthesis · 55224070
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
55224070
Fix a typo in luxi's docstring · 7577196d
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
7577196d

Update the logging output of job processing · d21d09d6

Iustin Pop authored 16 years ago

(this is related to the master daemon log)

Currently it's not possible to follow (in the non-debug runs) the
logical execution thread of jobs. This is due to the fact that we don't
log the thread name (so we lose the association of log messages to jobs)
and we don't log the start/stop of job and opcode execution.

This patch adds a new parameter to utils.SetupLogging that enables
thread name logging, and promotes some log entries from debug to info.
With this applied, it's easier to understand which log messages relate
to which jobs/opcodes.

The patch also moves the "INFO client closed connection" entry to debug
level, since it's not a very informative log entry.

Reviewed-by: ultrotter

d21d09d6

.gitignore: Don't exclude whole /autotools/ dir, but only files · ae59efea

Michael Hanselmann authored 16 years ago

This way newly added files will be not be excluded by default. Fixes
also a small whitespace error in utils.py.

Reviewed-by: iustinp

ae59efea

Convert RenameInstance to (status, data) · 96841384
Iustin Pop authored 16 years ago
```
This allows the rename failures to show the ouput of OS scripts.

Reviewed-by: ultrotter
```
96841384

Update gitignore rules · b903ba35

Iustin Pop authored 16 years ago

As per Michael's comment, gitignore should not ignore a couple of real
files from the autotools/ directory.

Reviewed-by: ultrotter

b903ba35

Fix adding of disks to an instance · 32388e6d

Iustin Pop authored 16 years ago

The ConfigWriter.AllocateDRBDMinor requires the instance name, not the
instance object. The LUSetInstanceParms is passing wrongly the instance
object, which can cause breakage.

The patch also adds asserts to check for this mismatch in ConfigWriter.

Reviewed-by: ultrotter

32388e6d

Fix burnin problems when using http checks · 5dc626fd

Iustin Pop authored 16 years ago

The urllib2 module has very bad error handling. This patch changes to urllib
which is simpler, and we derive a custom class from the FancyURLopener. Burning
is no longer keeping sockets in CLOSE_WAIT state with this patch.

Reviewed-by: ultrotter

5dc626fd

Make cluster-verify check the drbd minors space · 6d2e83d5

Iustin Pop authored 16 years ago

This patch adds support for verification of drbd minors space in cluster
verify: minors which belong to running instances and should be online
but are not, and minors which do not belong to any instace but are in
use.

The patch requires exposing some methods from bdev.DRBD8 and
config.ConfigWriter which were until now private methods.

Reviewed-by: ultrotter

6d2e83d5

Fix a couple of epydoc warnings · 2f907a8c
Iustin Pop authored 16 years ago
```
Reviewed-by: ultrotter
```
2f907a8c

DRBD: check for in-use minor during Create · 767d52d3

Iustin Pop authored 16 years ago

In order to prevent errors with old, in-use DRBD minors, we check and
abort at create time if our minor is already in use. For this we need to
also modify DRBD8Status to be able to parse cs:Unconfigured devices.

Reviewed-by: ultrotter

767d52d3

Add a TailFile function · f65f63ef

Iustin Pop authored 16 years ago

This patch adds a tail file function, to be used for parsing and returning in
the job log OS installation failures.

Reviewed-by: ultrotter

f65f63ef

Unify some unittest functions · 51596eb2

Iustin Pop authored 16 years ago

This patch adds unified temporary file handling to the
testutils.GanetiTestCase class, which adds easy creation and automated
cleanup of temporary files.

The patch allows a simpler handling in a couple of test cases but
requires all child classes to call the parent setUp and tearDown
methods.

Reviewed-by: ultrotter

51596eb2

Some small fixes in cmdlib · 1492cca7
Iustin Pop authored 16 years ago
```
Reviewed-by: ultrotter
```
1492cca7

Convert AddOSToInstance to (status, data) · 20e01edd

Iustin Pop authored 16 years ago

This allows the install and reinstall instance to return (hopefully)
relevant log files from the OS create scripts.

Reviewed-by: ultrotter

20e01edd

Convert the start instance rpc to (status, data) · dd279568

Iustin Pop authored 16 years ago

This will record the failure cause in starting up the instance in the
job log (and thus to the user).

Reviewed-by: ultrotter

dd279568

Jan 19, 2009

Fix handling of failures in create instance disks · 7d81697f

Iustin Pop authored 16 years ago

Commit 2302 only modified _CreateBlockDevOnPrimary to the new style
result, but _CreateBlockDevOnSecondary was forgotten. After the merger
of the two functions, _CreateBlockDevOnSecondary was taken as template
so we checked against old-style values, thus completely breaking error
handling.

Reviewed-by: imsnah

7d81697f

Move the default MAC prefix to the constants file · c5e489f7

Iustin Pop authored 16 years ago

Instead of having the default live in the gnt-cluster script, we move it
to the constants file. The patch also fixes a typo on constants.py.

Reviewed-by: ultrotter

c5e489f7

Use instance.all_nodes instead of hand-building it · 6b12959c

Iustin Pop authored 16 years ago

This patch replaces a few obvious uses of [instance.primary_node] +
list(instance.secondary_nodes) (or similar usage) with the new
instance.all_nodes.

Reviewed-by: ultrotter

6b12959c

Fix non-drbd instance creation · 99c7b2a1

Iustin Pop authored 16 years ago

Commit 2294 introduced a new instance.all_nodes property, which
unfortunately is working incorrectly for non-drbd instances.

This patch fixes it by making sure the primary node is always added to
the set, even before recursing over (any potential) children.

Reviewed-by: imsnah

99c7b2a1

Small simplification in MapLVsByNode · 7c5abcae

Iustin Pop authored 16 years ago

We don't need to pre-create the node entries in lvmap, since they will
be created at recursion time.

Reviewed-by: ultrotter

7c5abcae

Split the block device creation in two parts · de12473a

Iustin Pop authored 16 years ago

Some callers of _CreateBlockDev need recursive behaviour, but not all.
The replace secondary first creates (manually) new LVs to ensure storage
is there, and then it creates the new DRBD. At this point, we need a
non-recursive call so that the LVs are not needlessly re-created.

This patch splits the single device creation into a separate function,
so that LUReplaceDisks can use it.

Reviewed-by: ultrotter

de12473a

Combine the two _CreateBlockDevOnXXX functions · 428958aa

Iustin Pop authored 16 years ago

Since only two boolean parameters differ between these two functions, we
combine them as to have less code duplication. This will be needed in
the future as we will need to split off the recursive part off.

Reviewed-by: ultrotter

428958aa

Switch call_blockdev_create call to (status, data) · dab69e97

Iustin Pop authored 16 years ago

This allows errors to be visible at the user level instead of just node
daemon logs.

Reviewed-by: ultrotter

dab69e97

Small change in the instance disk creation path · 796cab27

Iustin Pop authored 16 years ago

For future propagation of error messages from backend to cmdlib and to
the job log, just having True/False return from the disk creation
function is not enough.

This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)
to raise exception on errors, and otherwise the return value is None.

Reviewed-by: ultrotter

796cab27

Block device creation cleanup · 6c626518

Iustin Pop authored 16 years ago

Currently when creation LVM-based instances, we always get the
extremely-confusing message "ERROR Can't find LV /dev/xenvg/..." which
is actually expected. This behaviour was introduced before we had
UUID-style LV names, since at that point it was not a unexpected to have
such volumes laying around after a failed creation.

Today, it's much more of an error to see existing volumes, and it's
better to abort with a failure. Since bdev.LogicalVolume.Create() method
will raise an error in case it exists, we can remove this check in
backend before creating the device.

The Create methods for DRBD and FileStorage currently don't raise
exception, as behaviour is not very well defined here.

We also change some exception types raised in bdev so that all
exceptions raised by device creation are a subclass of GenericError.

Reviewed-by: ultrotter

6c626518

Use the same root for both _data and _meta LVs · e6c1ff2f

Iustin Pop authored 16 years ago

Currently we use a different UUID for the _data and _meta volumes of a
DRBD disk. This is confusing as it's hard to associate the two in the
output of “lvs” or “gnt-node volumes”.

The patch changes so that they use the same prefix.

Reviewed-by: ultrotter

e6c1ff2f

Jan 16, 2009

Fix LUExportInstance · 998c712c

Iustin Pop authored 16 years ago

Due to deficiencies in our block device implementation, it is a must to
call SetDiskID on disks before passing them to remote nodes. Since in
export/import, we don't touch the disks themselves, this was not needed
before in this function.

However, since having instance symlinks, the correct ID is needed here
too, and with static minors it's a "must need". This reflects into
failed instance starts after migration and/or failover.

Reviewed-by: ultrotter

998c712c