Commits · 9f604ab8127becefe3161887286c72b984278f56 · itminedu / snf-ganeti

Aug 05, 2011

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9f604ab8

watcher: Write per-group instance status, merge into global one · 9bb69bb5

Michael Hanselmann authored 13 years ago

Each per-group watcher process writes its own instance status file. Once
that's done it tries to acquire an exclusive lock on the global file and
will proceed to read all status file, merging them based on each file's
mtime. If an instance is moved to another group, the newer status will
supersede that of an older file which hasn't yet been updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9bb69bb5

cleaner: Remove watcher's instance status file after 21 days · 6890cf2e
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
6890cf2e

utils.ReadFile: Add pre-read callback · 0e5084ee

Michael Hanselmann authored 13 years ago


This will be used by the watcher to store the file's fstat(2). It must
be done from the filehandle.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0e5084ee

Merge branch 'stable-2.4' · 2f1fe558

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f1fe558

Bumping version to 2.4.3 · 2f994ece

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f994ece

Aug 04, 2011

Fixed a typo in utils/process.py · eee68d57

Agata Murawska authored 13 years ago


Signed-off-by: Agata Murawska <agatamurawska@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

eee68d57

Fix unittest failure after list_owned changes · e8906f7d

Iustin Pop authored 13 years ago


We just need an object that has a list_owned method.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e8906f7d

Remove 15-second sleep from LUInstanceCreate · 12126847

Apollon Oikonomopoulos authored 13 years ago


Remove 15 second sleep when wait_for_sync is not set. LUInstanceCreate already
calls _WaitForSync with oneshot=True, which already performs an internal
wait-loop for disks to start syncing.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

12126847

Add a readability alias · af993a2c

Iustin Pop authored 13 years ago


lu.glm.list_owned becomes lu.owned_locks, which is clearer for the
reader.

Also rename three variables (which were before named owned_locks) to
make clearer what they track.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

af993a2c

Fix broken object references in docstrings · ce523de1

Michael Hanselmann authored 13 years ago


The module is called “objects”, not “object”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ce523de1

Add “gnt-instance change-group” command · bd2a5569

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bd2a5569

Add opcode to change instance's group · 1aef3df8

Michael Hanselmann authored 13 years ago


This is quite similar to evacuating a group, but the locking
is different.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1aef3df8

Factorize checking instance's node groups · eafa26af

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

eafa26af

Update the NEWS file for 2.4.3 · e20832af

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e20832af

ganeti-cleaner: Remove old watcher state files · 7b642c49

Michael Hanselmann authored 13 years ago


Watcher state files can stay around if node groups are removed. With
this patch they're removed after 21 days.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7b642c49

Remove WATCHER_STATEFILE constant · 173dbf05

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

173dbf05

cfgupgrade: Remove old watcher state file · a292020f

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a292020f

ganeti-watcher: Split for node groups · 16e0b9c9

Michael Hanselmann authored 13 years ago


This patch brings a huge change to ganeti-watcher to make it aware of
node groups. Each node group is processed in its own subprocess,
reducing the impact of long-running operations.

The global watcher state file, $datadir/ganeti/watcher.data, is replaced
with a state file per node group ($datadir/ganeti/watcher.${uuid}.data).

Previously a lock on the state file was used to ensure only one instance
of watcher was running at the same time. Some operations, e.g.
“gnt-cluster renew-crypto”, blocked the watcher by acquiring an
exclusive lock on the state file. Since the watcher processes now use
different files, this method is no longer usable. Locking multiple files
isn't atomic. Instead a dedicated lock file is used and every watcher
process acquires a shared lock on it. If a Ganeti command wants to block
the watcher it acquires the lock in exclusive mode.

Each per-nodegroup watcher process also acquires an exclusive lock on
its state file. This prevents multiple watchers from running for the
same nodegroup.

The code is reorganized heavily to clear up dependencies between
functions and to get rid of the global “client” variable. The utility
class “Watcher” is removed in favour of stand-alone utility functions.

Since the parent watcher process won't wait for its children by
default, a new option (--wait-children) was added. It is used, for
example, by QA.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

16e0b9c9

Lock potential target nodes for group evacuation · de9c12f7

Michael Hanselmann authored 13 years ago


All potential target nodes should be locked while calculating
a group evacuation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

de9c12f7

Small changes in group evacuation · 6e80da8b

Michael Hanselmann authored 13 years ago


- Use OpPrereqError in CheckPrereq
- Clarify command synopsis

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6e80da8b

cmdlib: Factorize getting iallocator · a14065ac

Michael Hanselmann authored 13 years ago


The same logic will be used for changing an instance's group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a14065ac

Add design document for Ganeti 2.5 · d774ce92

Michael Hanselmann authored 13 years ago


Including the designs which were actually implemented.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d774ce92

Aug 03, 2011

Pause DRBD sync for OS install if not wait_for_sync · 41e1e79e

Apollon Oikonomopoulos authored 13 years ago


When wait_for_sync is set to False in LUInstanceCreate, Ganeti lets DRBD sync
in the background while performing the rest of the installation steps,
including OS installation.

However, OS installation is a very disk-intensive task that intereferes badly
with the background I/O caused by DRBD's initial sync. To this end, we pause
the background sync before OS installation and unpause it afterwards, which
yields a significant speed boost for OS installation. The following should be
noted:

a) The user has requested not to wait for sync, i.e. the instance will be
   non-redundant for an unspecified interval anyway and delaying this by a
   couple of minutes is not a big compromise.

b) This approach is also followed during disk wiping.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: simplify an if check]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

41e1e79e

Fix documentation of gnt-instance failover · a6a3efe4

Iustin Pop authored 13 years ago


Explain that we only start the instance on the new node if it was
originally running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a6a3efe4

Small doc patch for gnt-node evacuate · 78623223

Iustin Pop authored 13 years ago


Just explain a bit the relation between node evacuate and instance
commands.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

78623223

Fix small typo in docstring · d5fca545

Iustin Pop authored 13 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d5fca545

Fix typo in NEWS · b5ea70bf

Michael Hanselmann authored 13 years ago


“--dry-run” starts with two dashes.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b5ea70bf

Change the backend.InstanceLogName signature · 6aa7a354

Iustin Pop authored 13 years ago


This uses now the component for the transfer (if available), otherwise
(e.g. in installs/renames) nothing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6aa7a354

Instance transfer: export component name to backend · 6613661a

Iustin Pop authored 13 years ago


This modifies the RPC layer to export the component name too to the
backend, so that it can be used in log files and messages.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6613661a

Instance transfer: add argument for the 'component' · 5e26c4d9

Iustin Pop authored 13 years ago


Currently, transfer data is done mainly with just the instance name,
but when we have instances with multiple disks this is not enough to
distinguish between the different transfers being done for the
instance.

Some parts of the code do have knowledge of the part being transferred
(i.e. DiskTransfer.name), but if I understood correctly not all, so I
decided to add a new argument to the respective disk import/disk
export classes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5e26c4d9

Optimise use of repeated/looping GetInstanceInfo · 71333cb9

Iustin Pop authored 13 years ago


Similar to the previous patch, this adds a helper function to
eliminate repeated calls info ConfigWriter.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

71333cb9

Optimise use of repeated/looping GetNodeInfo · f5eaa3c1

Iustin Pop authored 13 years ago


This adds a new ConfigWriter.GetMultiNodeInfo function and replaces
multiple/looping calls to GetNodeInfo with it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f5eaa3c1

Fix lint errors · a4338da2

Iustin Pop authored 13 years ago


It turns out that the only use of the operator module was for
itemgetter, so patch eb62069e should have removed that import too.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a4338da2

gnt-node.rst: Fix a typo · 71ed8d22

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

71ed8d22

Add two more compat functions · eb62069e

Iustin Pop authored 13 years ago


operator.itemgetter(0) → fst
operator.itemgetter(1) → snd

snd is not used yet, but it makes sense to add both.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

eb62069e

Add a flag to burnin to allow specifying VCPU count. · d0ffa390

Pedro Macedo authored 13 years ago


Signed-off-by: Pedro Macedo <pmacedo@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d0ffa390

Aug 02, 2011

Fix types passed to IAllocator · 7c070961

Iustin Pop authored 13 years ago


Iallocator mode reloc, parameter reloc_from takes a list; half of the
code already forced this parameter to list, we add the other two cases
where it is needed.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7c070961

htools: change absolute to relative symlinks · 7fa52acd

Iustin Pop authored 13 years ago


Currently we use absolute symlinks, but this doesn't work when we
install remotely (due to install first to local temp dir, then rsync
to remote machines). To fix, we change to manually-computed relative
paths, which is not best, but it works.

One possible alternative would be to use hard-links…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7fa52acd

jqueue: Add short delay before detecting job changes · dfc8824a

Michael Hanselmann authored 13 years ago


By sleeping for 100ms after receiving a notification for a changed job
file the job is given some additional time to change again. This
significantly reduces the number of LUXI calls for WaitForJobChanges
(depending on the job, in my tests with “gnt-cluster verify
--debug-simulate-errors” by about 80%), and improves performance (the
same job went from around 7 seconds to around 3.5 seconds).

This method is not perfect. The algorithm could be made more complex,
e.g. by increasing the delay on each change, etc., but for now this
simple change provides a good improvement.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dfc8824a