Commits · 1190610ed0dc1bb947e8ccf2018825062d24a6f9 · itminedu / snf-ganeti

Jun 22, 2011

Add pool management unit tests · 1190610e

Apollon Oikonomopoulos authored 13 years ago

Add unit tests to check proper IP address pool management logic.
The tests cover all simple operations on an IP address pool.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>

1190610e

Apr 28, 2011

Fix WriteFile with unicode data · 1d39e245

Iustin Pop authored 13 years ago


Unicode is fun, indeed:

>>> len(buffer("abc"))
3
>>> len(buffer(u"abc"))
12

So we can't pass unicode data to buffer(), as the result will be to
write the in-memory (usually UTF-32) representation to disk.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1d39e245

Apr 13, 2011

utils.WriteFile: Close file before renaming · a9d68e40

Michael Hanselmann authored 13 years ago

Issue 154 (http://code.google.com/p/ganeti/issues/detail?id=154

)
reported an “Operation not supported” error when writing instance
exports to a mounted CIFS filesystem. Experimentation showed the error
to only occur when using rename(2) on an opened file. Various references
on the web confirmed this observation. Whether or not the problem occurs
can also depend on the CIFS server implementation. In issue 154 it was
Windows 2008 R2.

While not solving all cases, closing the file before renaming helps
alleviating the issue a bit. Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a9d68e40

Apr 06, 2011

Increase the lock timeouts before we block-acquire · d385a174

Iustin Pop authored 13 years ago


This has been observed to cause problems on real clusters via the
following mechanism:

- a long job (e.g. a replace-disks) is keeping an exclusive lock on an
  instance
- the watcher starts and submits its query instances opcode which
  wants shared locks for all instances
- after about an hour, the watcher job falls back to blocking acquire,
  after having acquired all other locks
- any instance opcode that wants an exclusive lock for an instance
  cannot start until the watcher has finished, even though there's no
  actual operation on that instance

In order to alleviate this problem, we simply increase the max timeout
until lock acquires are sent back to either blocking acquire or
priority increase. The timeout is computed such that we wait ~10 hours
(instead of one) for this to happen, which should be within the
maximum lifetime of a reasonable opcode on a healthy cluster. The
timeout also means that priority increases will happen every half hour.

We also increase the max wait interval to 15 seconds, otherwise we'd
have too many retries with the increased interval.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d385a174

Mar 16, 2011

locking: Fix race condition in lock monitor · e4e35357

Michael Hanselmann authored 14 years ago


In some rare cases it can happen that a lock is re-created very soon
after deletion, while the old instance hasn't been destructed yet. In
such a case the code would detect a duplicate name and raise an
exception.

We have seen at least one case where this happened during the creation
of many instances. It is not exactly clear how it came to be, but it
appears to have occurred while different jobs fought for locks with
short timeouts (in the case of instance creation locks are added at this
stage and removed shortly after if not all locks can be acquired).

The issue is fixed by removing the check for duplicate names. To still
guarantee a stable sort order for the lock information as shown by
“gnt-debug locks”, a registration number is recorded for each lock in
the monitor.

A unittest is included to check for the situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e4e35357

Mar 15, 2011

utils: Export NiceSortKey function · 7d4da09e

Michael Hanselmann authored 14 years ago


The ability to split a string into a list of strings and integers can be
handy elsewhere and is necessary for sorting query results by names.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit f47941f8)

7d4da09e

Mar 08, 2011

cfgupgrade: Fix critical bug overwriting RAPI users file · 87c80992

Michael Hanselmann authored 14 years ago


The cfgupgrade tool was designed to be idempotent, that means it could
be run several times and still give produce the correct result. Ganeti
2.4 moved the file containing the RAPI users to a separate directory
(…/lib/ganeti/rapi/users). If it exists, cfgupgrade would automatically
move an existing file from …/lib/ganeti/rapi_users and replace it with a
symlink.

Unfortunately one of the checks for this was incorrect and, when run
multiple times, replaces the users file at the new location with a
symlink created during a previous run.

In addition the “--dry-run” parameter to cfgupgrade was not respected.
Unittests are updated for all these cases.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

87c80992

Feb 23, 2011

query_unittest: Fix argument to set() · bacae536

René Nussbaumer authored 14 years ago


Commit e431074f introduced an uncatched bug. This patch fixes this. The
set is expecting a list or iteratable to work on, so it splitted the
provided instance name into a set of characters. This caused the
exp_status never been set and therefore not catched in one assert rule
further below who checks that every status was tested.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bacae536

Feb 18, 2011

Change the list formatting to a 'special' chars · f0b1bafe

Iustin Pop authored 14 years ago


And also enable verbose display via the, well, verbose option. Man
page and tests are updated, and the formatting is moved from 4 if
statements to a data structure.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f0b1bafe

Feb 17, 2011

NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes · effab4ca

Iustin Pop authored 14 years ago


Since we don't have the data per design, UNAVAIL is appropriate here,
while NODATA is not.

The patch also adds a comment: if we extend the live fields list to
contain other data in the future, we need to reevaluate this solution.

This should fix issue 143. The listing now shows (node2==ofline,
node3==not vm_capable):

  Node     DTotal     DFree    MTotal     MNode     MFree Pinst Sinst
  node1    698.6G    630.5G     32.0G      1.0G     30.0G     8     7
  node2 (offline) (offline) (offline) (offline) (offline)     9     4
  node3 (unavail) (unavail) (unavail) (unavail) (unavail)     0     0

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

effab4ca

Feb 02, 2011

utils.SetupLogging: Return function to reopen log file · 9a6813ac

Michael Hanselmann authored 14 years ago


This function can be used from a SIGHUP handler to reopen log files.
Initial, simple unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9a6813ac

Jan 31, 2011

Introduce re-openable log record handler · b6fa9a44

Michael Hanselmann authored 14 years ago


This patch adds a new log handler class based on the standard library's
BaseRotatingHandler. This new class allows the log file to be re-opened,
e.g. upon receiving a SIGHUP signal. The latter will be implemented in
forthcoming patches. The patch does not change the behaviour regarding
writing to /dev/console.

Quite a bit of code had to be changed to unittest the log handlers.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b6fa9a44

Jan 28, 2011

Add RAPI resource for instance console · b82d4c5e

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b82d4c5e

Export console information as query field · 5d28cb6f

Michael Hanselmann authored 14 years ago


This makes it possible to get the console information via a LUXI query.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5d28cb6f

Fix instance list for instances running multiple times · e431074f

René Nussbaumer authored 14 years ago


If for some reason (e.g. failed migration) one instance is running
on multiple nodes the output can become inconsistent. To get that error
and make it consistent between runs we make the call on the secondary
too and look if it's running there. If so we report the instance as
ERROR_wrongnode.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e431074f

Jan 27, 2011

Fix unittest breakage on Python 2.4/2.5 · c6a65efb

Michael Hanselmann authored 14 years ago


Commit 70b0d2a2 broke unittests on Python 2.4 and 2.5. Turns out that
Python 2.6 and above allow classes to be passed as custom test runners,
whereas earlier versions don't.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c6a65efb

Check for duplicate RAPI URIs and handlers · d50a2223

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d50a2223

Ensure all resources are used by RAPI client · 70b0d2a2

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

70b0d2a2

RAPI client: De-/activating instance disks · b680c8be

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b680c8be

RAPI client: Wrap /2/redistribute-config resource · 54d4c13b

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

54d4c13b

Add unittest for RAPI client's ModifyInstance · 08c11c40

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

08c11c40

Jan 21, 2011

Rename QRFS_* to RS_* · cfb084ae

René Nussbaumer authored 14 years ago


This patch renames QRFS_* to RS_* fields so they can be used in other
places (i.e. LUs) without confusion, as this was initially meant for
query operations.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cfb084ae

Jan 20, 2011

query: Add alias support in _PrepareFieldList · d63bd540

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

d63bd540

Jan 18, 2011

query: Change internal result computation · e2d188cc

Iustin Pop authored 14 years ago


While looking at the query library, I realized that while we have five
field statuses, making this a 5-dimensional space, four of them are
shrunk to a single possible value (None). Hence it should be possible to
convert this into a single value space plus extra 4 special constants.

This patch implements this, making (IMHO) the return value of normal
functions much simpler: you simply return the desired value, instead of
(QRFS_NORMAL, value); for the special results, you simply return
_FS_UNAVAIL, instead of (QRFS_UNAVAIL, None). This I believe does
simplify the code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e2d188cc

KVM: Perform network configuration in Ganeti · 5d9bfd87

Apollon Oikonomopoulos authored 14 years ago


This patch introduces network configuration for KVM in Ganeti.

There are three problems with having KVM perform network configuration via ifup
scripts:
  a) Ganeti never gets to know the tap interface that is associated with an
     instance's NIC
  b) Migration of routed instances will cause network problems because the
     incoming KVM side configures the network as soon as it is spawned and not
     as soon as the migration finishes. This means that all routing
     configuration will be present in both, primary and secondary, nodes at the
     same time, possibly causing network disruption during the migration.
  c) We never get to know if the network configuration succeeded or not.

This patch moves network configuration from KVM to Ganeti, using KVM's ability
to receive already open tap devices as file descriptors.

_WriteNetScript is removed from hv_kvm.py, together with its unit tests.

Minor modifications are made to _ExecKVMRuntime to handle tap device
initialization. NIC <-> tap associations are stored under a new directory,
_ROOT_DIR/nic in a file-per-nic fashion.

The end-user semantics remain the same: The user can override the network
configuration by providing _KVM_NET_SCRIPT. If this is not present or
executable, the default constants.KVM_IFUP script is run.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

5d9bfd87

Check consistency of the class names and OP_ID · ff0d18e6

Iustin Pop authored 14 years ago


As the class names should be now consistent with the OP_IDs, we add a
check for wrongly-defined OP_IDs.

However, the future removal of the hand-coded OP_IDs will render this
obsolete, so this check is introduced just to make sure that the
previous renaming patches did the right job, and it will then be
removed.

The consistency checks require renaming the test opcodes, which were
using arbitrary names, depending on test author. They are now all
standardized on OpTest (local scope).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

ff0d18e6

Rename OpTestJobqueue and LUTestJobqueue · b469eb4d

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b469eb4d

Rename OpGetTags and LUGetTags · c6afb1ca

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c6afb1ca

Rename OpStartupInstance and LUStartupInstance · c873d91c

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c873d91c

Rename OpShutdownInstance and LUShutdownInstance · ee3e37a7

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ee3e37a7

Rename OpSetInstanceParams and LUSetInstanceParams · 9a3cc7ae

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9a3cc7ae

Rename OpRenameInstance and LURenameInstance · 5659e2e2

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

5659e2e2

Rename OpReinstallInstance and LUReinstallInstance · 5073fd8f

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5073fd8f

Rename OpMigrateInstance and LUMigrateInstance · 75c866c2

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

75c866c2

Rename OpCreateInstance and LUCreateInstance · e1530b10

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

e1530b10

Rename OpRenameGroup and LURenameGroup · a8173e82

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a8173e82

Rename OpAssignGroupNodes and LUAssignGroupNodes · 934704ae

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

934704ae

Rename OpVerifyCluster and LUVerifyCluster · a3d32770

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a3d32770

Rename OpExportInstance and LUExportInstance · 4ff922a2

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

4ff922a2

Jan 14, 2011

Bump version for Ganeti 2.4.0~beta1 · a91f69c4

Michael Hanselmann authored 14 years ago


Update the version in all necessary places. Update NEWS with release
date.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a91f69c4