Commits · d385a1744c144052eaade85c38dd7106d9abf371 · itminedu / snf-ganeti

Apr 06, 2011

Increase the lock timeouts before we block-acquire · d385a174

Iustin Pop authored 13 years ago


This has been observed to cause problems on real clusters via the
following mechanism:

- a long job (e.g. a replace-disks) is keeping an exclusive lock on an
  instance
- the watcher starts and submits its query instances opcode which
  wants shared locks for all instances
- after about an hour, the watcher job falls back to blocking acquire,
  after having acquired all other locks
- any instance opcode that wants an exclusive lock for an instance
  cannot start until the watcher has finished, even though there's no
  actual operation on that instance

In order to alleviate this problem, we simply increase the max timeout
until lock acquires are sent back to either blocking acquire or
priority increase. The timeout is computed such that we wait ~10 hours
(instead of one) for this to happen, which should be within the
maximum lifetime of a reasonable opcode on a healthy cluster. The
timeout also means that priority increases will happen every half hour.

We also increase the max wait interval to 15 seconds, otherwise we'd
have too many retries with the increased interval.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d385a174

Mar 16, 2011

locking: Fix race condition in lock monitor · e4e35357

Michael Hanselmann authored 14 years ago


In some rare cases it can happen that a lock is re-created very soon
after deletion, while the old instance hasn't been destructed yet. In
such a case the code would detect a duplicate name and raise an
exception.

We have seen at least one case where this happened during the creation
of many instances. It is not exactly clear how it came to be, but it
appears to have occurred while different jobs fought for locks with
short timeouts (in the case of instance creation locks are added at this
stage and removed shortly after if not all locks can be acquired).

The issue is fixed by removing the check for duplicate names. To still
guarantee a stable sort order for the lock information as shown by
“gnt-debug locks”, a registration number is recorded for each lock in
the monitor.

A unittest is included to check for the situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e4e35357

Mar 15, 2011

utils: Export NiceSortKey function · 7d4da09e

Michael Hanselmann authored 14 years ago


The ability to split a string into a list of strings and integers can be
handy elsewhere and is necessary for sorting query results by names.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit f47941f8)

7d4da09e

Feb 23, 2011

query_unittest: Fix argument to set() · bacae536

René Nussbaumer authored 14 years ago


Commit e431074f introduced an uncatched bug. This patch fixes this. The
set is expecting a list or iteratable to work on, so it splitted the
provided instance name into a set of characters. This caused the
exp_status never been set and therefore not catched in one assert rule
further below who checks that every status was tested.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bacae536

Feb 18, 2011

Change the list formatting to a 'special' chars · f0b1bafe

Iustin Pop authored 14 years ago


And also enable verbose display via the, well, verbose option. Man
page and tests are updated, and the formatting is moved from 4 if
statements to a data structure.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f0b1bafe

Feb 17, 2011

NodeQuery: mark live fields as UNAVAIL for non-vm_capable nodes · effab4ca

Iustin Pop authored 14 years ago


Since we don't have the data per design, UNAVAIL is appropriate here,
while NODATA is not.

The patch also adds a comment: if we extend the live fields list to
contain other data in the future, we need to reevaluate this solution.

This should fix issue 143. The listing now shows (node2==ofline,
node3==not vm_capable):

  Node     DTotal     DFree    MTotal     MNode     MFree Pinst Sinst
  node1    698.6G    630.5G     32.0G      1.0G     30.0G     8     7
  node2 (offline) (offline) (offline) (offline) (offline)     9     4
  node3 (unavail) (unavail) (unavail) (unavail) (unavail)     0     0

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

effab4ca

Feb 02, 2011

utils.SetupLogging: Return function to reopen log file · 9a6813ac

Michael Hanselmann authored 14 years ago


This function can be used from a SIGHUP handler to reopen log files.
Initial, simple unittests are included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9a6813ac

Jan 31, 2011

Introduce re-openable log record handler · b6fa9a44

Michael Hanselmann authored 14 years ago


This patch adds a new log handler class based on the standard library's
BaseRotatingHandler. This new class allows the log file to be re-opened,
e.g. upon receiving a SIGHUP signal. The latter will be implemented in
forthcoming patches. The patch does not change the behaviour regarding
writing to /dev/console.

Quite a bit of code had to be changed to unittest the log handlers.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b6fa9a44

Jan 28, 2011

Add RAPI resource for instance console · b82d4c5e

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b82d4c5e

Export console information as query field · 5d28cb6f

Michael Hanselmann authored 14 years ago


This makes it possible to get the console information via a LUXI query.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5d28cb6f

Fix instance list for instances running multiple times · e431074f

René Nussbaumer authored 14 years ago


If for some reason (e.g. failed migration) one instance is running
on multiple nodes the output can become inconsistent. To get that error
and make it consistent between runs we make the call on the secondary
too and look if it's running there. If so we report the instance as
ERROR_wrongnode.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e431074f

Jan 27, 2011

Fix unittest breakage on Python 2.4/2.5 · c6a65efb

Michael Hanselmann authored 14 years ago


Commit 70b0d2a2 broke unittests on Python 2.4 and 2.5. Turns out that
Python 2.6 and above allow classes to be passed as custom test runners,
whereas earlier versions don't.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c6a65efb

Check for duplicate RAPI URIs and handlers · d50a2223

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d50a2223

Ensure all resources are used by RAPI client · 70b0d2a2

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

70b0d2a2

RAPI client: De-/activating instance disks · b680c8be

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b680c8be

RAPI client: Wrap /2/redistribute-config resource · 54d4c13b

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

54d4c13b

Add unittest for RAPI client's ModifyInstance · 08c11c40

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

08c11c40

Jan 21, 2011

Rename QRFS_* to RS_* · cfb084ae

René Nussbaumer authored 14 years ago


This patch renames QRFS_* to RS_* fields so they can be used in other
places (i.e. LUs) without confusion, as this was initially meant for
query operations.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cfb084ae

Jan 20, 2011

query: Add alias support in _PrepareFieldList · d63bd540

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

d63bd540

Jan 18, 2011

query: Change internal result computation · e2d188cc

Iustin Pop authored 14 years ago


While looking at the query library, I realized that while we have five
field statuses, making this a 5-dimensional space, four of them are
shrunk to a single possible value (None). Hence it should be possible to
convert this into a single value space plus extra 4 special constants.

This patch implements this, making (IMHO) the return value of normal
functions much simpler: you simply return the desired value, instead of
(QRFS_NORMAL, value); for the special results, you simply return
_FS_UNAVAIL, instead of (QRFS_UNAVAIL, None). This I believe does
simplify the code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e2d188cc

KVM: Perform network configuration in Ganeti · 5d9bfd87

Apollon Oikonomopoulos authored 14 years ago


This patch introduces network configuration for KVM in Ganeti.

There are three problems with having KVM perform network configuration via ifup
scripts:
  a) Ganeti never gets to know the tap interface that is associated with an
     instance's NIC
  b) Migration of routed instances will cause network problems because the
     incoming KVM side configures the network as soon as it is spawned and not
     as soon as the migration finishes. This means that all routing
     configuration will be present in both, primary and secondary, nodes at the
     same time, possibly causing network disruption during the migration.
  c) We never get to know if the network configuration succeeded or not.

This patch moves network configuration from KVM to Ganeti, using KVM's ability
to receive already open tap devices as file descriptors.

_WriteNetScript is removed from hv_kvm.py, together with its unit tests.

Minor modifications are made to _ExecKVMRuntime to handle tap device
initialization. NIC <-> tap associations are stored under a new directory,
_ROOT_DIR/nic in a file-per-nic fashion.

The end-user semantics remain the same: The user can override the network
configuration by providing _KVM_NET_SCRIPT. If this is not present or
executable, the default constants.KVM_IFUP script is run.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

5d9bfd87

Check consistency of the class names and OP_ID · ff0d18e6

Iustin Pop authored 14 years ago


As the class names should be now consistent with the OP_IDs, we add a
check for wrongly-defined OP_IDs.

However, the future removal of the hand-coded OP_IDs will render this
obsolete, so this check is introduced just to make sure that the
previous renaming patches did the right job, and it will then be
removed.

The consistency checks require renaming the test opcodes, which were
using arbitrary names, depending on test author. They are now all
standardized on OpTest (local scope).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

ff0d18e6

Rename OpTestJobqueue and LUTestJobqueue · b469eb4d

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b469eb4d

Rename OpGetTags and LUGetTags · c6afb1ca

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c6afb1ca

Rename OpStartupInstance and LUStartupInstance · c873d91c

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c873d91c

Rename OpShutdownInstance and LUShutdownInstance · ee3e37a7

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ee3e37a7

Rename OpSetInstanceParams and LUSetInstanceParams · 9a3cc7ae

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9a3cc7ae

Rename OpRenameInstance and LURenameInstance · 5659e2e2

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

5659e2e2

Rename OpReinstallInstance and LUReinstallInstance · 5073fd8f

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5073fd8f

Rename OpMigrateInstance and LUMigrateInstance · 75c866c2

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

75c866c2

Rename OpCreateInstance and LUCreateInstance · e1530b10

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

e1530b10

Rename OpRenameGroup and LURenameGroup · a8173e82

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a8173e82

Rename OpAssignGroupNodes and LUAssignGroupNodes · 934704ae

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

934704ae

Rename OpVerifyCluster and LUVerifyCluster · a3d32770

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a3d32770

Rename OpExportInstance and LUExportInstance · 4ff922a2

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

4ff922a2

Jan 14, 2011

Bump version for Ganeti 2.4.0~beta1 · a91f69c4

Michael Hanselmann authored 14 years ago


Update the version in all necessary places. Update NEWS with release
date.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a91f69c4

Jan 13, 2011

List node parameters (if any) in gnt-node info · 8572f1fe

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8572f1fe

Jan 12, 2011

Fix typos in RAPI docstrings, add unittest · b58a4d16

Michael Hanselmann authored 14 years ago


This patch fixes a number of typos and standardizes RAPI resource
docstrings. A unittest is added.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b58a4d16

Jan 11, 2011

Add ability to retain specified fds open in RunCmd · 7b0bf9cd

Apollon Oikonomopoulos authored 14 years ago


Passing tap devices to KVM as file descriptors requires that the respective
file decriptors remain open during utils.RunCmd execution. To this direction,
we add a “noclose_fds” keyword argument to utils.RunCmd, accepting a list of
file descriptors to keep open. The actual fd handling is implemented in
_RunCmdPipe and _RunCmdFile using subprocess.Popen's “preexec_fn”[1],
since subprocess.Popen provides no other way to selectively handle fds.

A small modification is also made to test/ganeti.utils_unittest.py to comply
with _RunCmdPipe's new API and a new unit test is added to test the selective
fd retention functionality.

[1] “If preexec_fn is set to a callable object, this object will be called in
     the child process just before the child is executed. (Unix only)”
    Subprocess documentation

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

7b0bf9cd

Add tests for objects.Instance · 6a050007

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6a050007