Commits · 58ea8d17b5b6deb4b93ce0f419c449c35a35a0d5 · itminedu / snf-ganeti

Jan 19, 2012

Fix wrong option names in QA and cluster-merge · 58ea8d17

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

58ea8d17

Bump version to 2.5.0~rc5, update NEWS · 7cbdc2a2

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7cbdc2a2

Add UnescapeAndSplit unittest for multi-escapes · e7f7212b

Iustin Pop authored 13 years ago


This would have caught the bug in the first place. Argh,
hand-generated test cases!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e7f7212b

Fix a bug in command line option parsing code · ecabe27e

Nikos Skalkotos authored 13 years ago


Fix bug affecting command line options of "keyval" type. Although
escaping commands with \ is supported, it is is not applied to the
input recursively.

Signed-off-by: Nikos Skalkotos <skalkoto@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ecabe27e

Jan 18, 2012

cli: Disable abbreviation matching for options · 232aab3f

Michael Hanselmann authored 13 years ago


Python's “optparse” module does option name prefix matching by default.
Since this can lead to confusing behaviour, e.g. by specifying “--force”
for a command which only has a “--force-multi” option, this patch
disables the prefix matching.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

232aab3f

Jan 06, 2012

Merge branch 'stable-2.5' into devel-2.5 · c52d91e6

Guido Trotter authored 13 years ago


* stable-2.5:
  KVM: support version reported by 1.0

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

c52d91e6

KVM: support version reported by 1.0 · 585c8187

Guido Trotter authored 13 years ago


This of course was working for all the rcs, but broke with 1.0 itself.

In addition:
  - split between running kvm --version and parsing its output
  - unittest parsing for various known --help outputs
  - updated NEWS file
  - happy 2012 wishes
  - the hope to finish this patch before it's time to say happy easter
    :)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

585c8187

Dec 22, 2011

doc/admin: Clarify archived jobs · 89907375

Michael Hanselmann authored 13 years ago


Also mention that archived jobs can be viewed using “gnt-job info”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

89907375

Merge branch 'stable-2.5' into devel-2.5 · 4f44e311

Michael Hanselmann authored 13 years ago


* stable-2.5:
  jqueue: Fix epylint errors introduced in 37d76f1e
  jqueue: Fix deadlock between job queue and dependency manager

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4f44e311

jqueue: Factorize checking job processor's result · df5a5730

Michael Hanselmann authored 13 years ago


This allows for more unittesting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

df5a5730

jqueue unittest: Rename simple fake-job class · 1b5150ab

Michael Hanselmann authored 13 years ago


Also add a parameter for priority, to be used in an upcoming
patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1b5150ab

Dec 21, 2011

jqueue: Fix epylint errors introduced in 37d76f1e · 1316ebc2
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
1316ebc2

jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e

Michael Hanselmann authored 13 years ago


When an opcode is about to be processed its dependencies are
evaluated using “_JobDependencyManager.CheckAndRegister”. Due
to its nature that function requires a lock on the manager's
internal structures. All of this happens while the job queue
lock is held in shared mode (required for the job processor).

When a job has been processed any pending dependencies are re-added
to the job workerpool. Before this patch that would require
the manager's lock and then, for adding the jobs, the job queue
lock. Since this is in reverse order it will lead to deadlocks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

37d76f1e

Dec 19, 2011

locking: Add “__repr__” to SharedLock and PipeCondition · 20699809

Michael Hanselmann authored 13 years ago


These help when debugging.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

20699809

Dec 08, 2011

daemon.GenericMain: Don't generate backtrace on conflicting daemons · 4958b41e

Michael Hanselmann authored 13 years ago


Instead, print a nicer error message. This should fix issue 200.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4958b41e

utils.io.WritePidFile: Improve error reporting · b6522276

Michael Hanselmann authored 13 years ago


If the PID file is already locked by another process, try to read
the content and report it as part of the error message.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b6522276

utils.ListVisibleFiles: Hide “lost+found” directories · 2dbc6857

Michael Hanselmann authored 13 years ago


If a “lost+found” directory is found at a filesystem's root path it is
ignored. In all other cases directory entries named “lost+found” are
treated normally. Unittests are included. Fixes issue 153.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2dbc6857

Nov 30, 2011

Fix race condition in test for *FileID functions · fbd55434

Michael Hanselmann authored 13 years ago


In this test the “file ID” of a temporary file is compared against the
file ID gathered via an open file descriptor to the same file. For
reasons unknown to me utime(2) is called in-between to update the
inode's a- and mtime. Depending on the file system's timestamp
resolution this can lead to a different file ID.

Found by chance during QA and reproduced by adding a delay before the
call to utime(2).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fbd55434

Nov 24, 2011

Merge branch 'devel-2.4' into devel-2.5 · 53f75d02

Michael Hanselmann authored 13 years ago


* devel-2.4:
  ConfigWriter: Fix epydoc error

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

53f75d02

Merge branch 'stable-2.5' into devel-2.5 · 1df43aa3

Michael Hanselmann authored 13 years ago


* stable-2.5:
  ConfigWriter: Fix epydoc error

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

1df43aa3

ConfigWriter: Fix epydoc error · 1d4930b9

Michael Hanselmann authored 13 years ago


The parameter is called “mods”, not “modes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
(cherry picked from commit 1730d4a1)

1d4930b9

ConfigWriter: Fix epydoc error · 1730d4a1

Michael Hanselmann authored 13 years ago


The parameter is called “mods”, not “modes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

1730d4a1

Merge branch 'devel-2.4' into devel-2.5 · a0d39c8f

Michael Hanselmann authored 13 years ago


* devel-2.4:
  LUGroupAssignNodes: Fix node membership corruption

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a0d39c8f

Merge branch 'stable-2.5' into devel-2.5 · 2b9245ff

Michael Hanselmann authored 13 years ago


* stable-2.5:
  LUGroupAssignNodes: Fix node membership corruption
  Fix pylint warning on unreachable code
  LUNodeEvacuate: Disallow migrating all instances at once
  LUNodeEvacuate: Locking fixes
  Fix error when removing node

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

2b9245ff

LUGroupAssignNodes: Fix node membership corruption · 54c31fd3

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 218f4c3d)

54c31fd3

LUGroupAssignNodes: Fix node membership corruption · 218f4c3d

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

218f4c3d

Fix pylint warning on unreachable code · 9c4f4dd6

Michael Hanselmann authored 13 years ago


Commit c50452c3 added an exception when all instances should be
evacuated off a node, but did so in a way which made pylint complain
about unreachable code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9c4f4dd6

Nov 23, 2011

LUNodeEvacuate: Disallow migrating all instances at once · c50452c3

Michael Hanselmann authored 13 years ago


There is a design issue in the iallocator interface which prevents us
from doing this.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

c50452c3

Separate OpNodeEvacuate.mode from iallocator · cb92e7a1

Michael Hanselmann authored 13 years ago


Until now the iallocator constants for node evacuation
(IALLOCATOR_NEVAC_*) were also used for the opcode. However, it turned
out this was due to a misunderstanding and is incorrect. This patch adds
new constants (with the same values) and changes the affected places.
Fortunately the RAPI client already used good names, so no changes are
necessary.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

Signed-off-by: Michael Hanselmann <hansmi@google.com>

cb92e7a1

LUNodeEvacuate: Locking fixes · 50722bfd

Michael Hanselmann authored 13 years ago


When evacuating a node, only an assertion without informative text was
used to check if the necessary node locks had been acquired. This was on
top of evaluating the list of nodes without having a node group lock, so
this was changed as well.

Also update some exception messages to include “retry the operation”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

50722bfd

Fix error when removing node · d05326fc

Michael Hanselmann authored 13 years ago


ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list.
Removing a node would fail with “too many values to unpack”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d05326fc

Nov 18, 2011

Merge branch 'stable-2.5' into devel-2.5 · 0c1441eb

Michael Hanselmann authored 13 years ago


* stable-2.5:
  htools: rework message display construction
  hbal: handle empty node groups
  Document OpNodeMigrate's result for RAPI
  Fail if node/group evacuation can't evacuate instances
  LUInstanceRename: Compare name with name
  LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode
  Update NEWS for 2.5.0~rc4
  Bump version to 2.5.0~rc4
  jqueue: Allow zero jobs to be submitted at once
  hail: don't select the primary as new secondary
  hail: add an extra safety check in relocate
  Bump version to 2.5.0~rc3

Conflicts:
	configure.ac: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0c1441eb

Merge branch 'devel-2.4' into devel-2.5 · 05c2e624

Michael Hanselmann authored 13 years ago


* devel-2.4:
  Ensure unused ports return to the free port pool
  Re-wrap a paragraph to eliminate a sphinx warning

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

05c2e624

Nov 17, 2011

LUInstanceCreate: Release unused node locks · ac2c8bc0

Michael Hanselmann authored 13 years ago


After iallocator ran we can release any unused node locks. Since they
must be in exclusive mode this should improve parallelization during
instance creation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ac2c8bc0

Nov 16, 2011

htools: rework message display construction · bdd8c739

Iustin Pop authored 13 years ago


While diagnosing some (unrelated) memory usage in htools, I've
stumbled upon some very bad behaviour in checkData: mapAccum is
non-strict, and the tuple we use also, so that results in the list of
list of messages being very bad space-wise (hundreds of MB of memory
for a simulated cluster with thousands of nodes, all with errors).

The new, explicit reuse of the old message list has a linear memory
behaviour. The only downside is that messages are listed in the
reverse order (which I'll fix on master).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bdd8c739

hbal: handle empty node groups · 2072221f

Iustin Pop authored 13 years ago


This patch changes an internal assert (which can only be triggered
when a node group is empty) into properly handling this case (and
returning empty node/instance lists).

While we could handle this in the backend (Cluster.splitNodeGroup)
this would actually mean than we change the behaviour for a cluster
with just two node groups, once of which is empty (where today we
don't require a node group argument).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2072221f

Nov 15, 2011

Document OpNodeMigrate's result for RAPI · 65c9591c

Michael Hanselmann authored 13 years ago


- Commit b7a1c816 changed the LU to generate jobs
- Mention documented results in NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

65c9591c

Nov 14, 2011

Ensure unused ports return to the free port pool · f396ad8c

Vangelis Koukis authored 13 years ago


Ensure ports previously allocated by calling ConfigWriter's AllocatePort() are
returned to the pool of free ports when no longer needed:

 * Return the network_port of an instance when it is removed
 * Return the port used by a DRBD-based disk when it is removed

Signed-off-by: Vangelis Koukis <vkoukis@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

f396ad8c

Re-wrap a paragraph to eliminate a sphinx warning · ca8f5622

Iustin Pop authored 13 years ago


This just makes sure that the paragraph doesn't contains lines that
start with :, which make Sphinx (1.0.7) complain.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

ca8f5622

Nov 10, 2011

Fix newer pylint's E0611 error in compat.py · 1dd64393

Guido Trotter authored 13 years ago


These are triggered by our "stay-compatible" approach.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

1dd64393