Commits · c22341e6b5edee95c12ff0d30ea35e34878a73cf · itminedu / snf-ganeti

Jan 26, 2012

Makefile.am: fix permissions for Python scripts on install · c22341e6


Some Python scripts in /usr/lib/ganeti/ were getting the wrong permissions
(their 'x' bit was cleared).  This patch fixes that behavior.

This patch renames the variable 'dist_tools_PYTHON' to 'python_scripts'.
Some Python scripts were listed in the 'dist_tools_PYTHON' variable, but as
said scripts have no .py extension in their names, Automake treated the scripts
as data files, and hence no 'x' bit.  Now the Python scripts are processed
by the rules created for the 'dist_tools_SCRIPTS' variable, and such rules
don't depend on file name extensions.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit cc120286)

c22341e6

devel/upload: Fix permissions for installed directories · 40476293

Bernardo Dal Seno authored 13 years ago


Permissions for the directories created during install depended on the
umask of the user running the script.  Now umask is reset inside the script
to remove such dependency.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 0f796800)

40476293

Jan 25, 2012

Fix cluster verification issues on multi-group clusters · 2c2f257d

Michael Hanselmann authored 13 years ago


This patch attempts to fix a number of issues with “gnt-cluster verify”
in presence of multiple node groups and DRBD8 instances split over nodes
in more than one group.

- Look up instances in a group only by their primary node (otherwise
  split instances would be considered when verifying any of their node's
  groups)
- When gathering additional nodes for LV checks, just compare instance's
  node's groups with the currently verified group instead of comparing
  against the primary node's group
- Exclude nodes in other groups when calculating N+1 errors and checking
  logical volumes

Not directly related, but a small error text is also clarified.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2c2f257d

Jan 20, 2012

Migrate: don't check for free memory on cleanup · 6b826dfa

Guido Trotter authored 13 years ago

Cleanup just updates the config with the correct location of the
instance, or informs of its down status, but never starts it. As such
there's no point in checking for enough free memory. Actually this check
could prevent a perfectly safe cleanup operation if a node is busy.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6b826dfa

Jan 09, 2012

Bump version to 2.5.0~rc5, update NEWS · 9f18e2cc

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9f18e2cc

Merge branch 'devel-2.4' into stable-2.5 · a41fd46e

Michael Hanselmann authored 13 years ago


* devel-2.4:
  Add UnescapeAndSplit unittest for multi-escapes
  Fix a bug in command line option parsing code
  ConfigWriter: Fix epydoc error
  LUGroupAssignNodes: Fix node membership corruption
  Ensure unused ports return to the free port pool
  Re-wrap a paragraph to eliminate a sphinx warning

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a41fd46e

Jan 06, 2012

KVM: support version reported by 1.0 · 585c8187

Guido Trotter authored 13 years ago


This of course was working for all the rcs, but broke with 1.0 itself.

In addition:
  - split between running kvm --version and parsing its output
  - unittest parsing for various known --help outputs
  - updated NEWS file
  - happy 2012 wishes
  - the hope to finish this patch before it's time to say happy easter
    :)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

585c8187

Dec 21, 2011

jqueue: Fix epylint errors introduced in 37d76f1e · 1316ebc2
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
1316ebc2

jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e

Michael Hanselmann authored 13 years ago


When an opcode is about to be processed its dependencies are
evaluated using “_JobDependencyManager.CheckAndRegister”. Due
to its nature that function requires a lock on the manager's
internal structures. All of this happens while the job queue
lock is held in shared mode (required for the job processor).

When a job has been processed any pending dependencies are re-added
to the job workerpool. Before this patch that would require
the manager's lock and then, for adding the jobs, the job queue
lock. Since this is in reverse order it will lead to deadlocks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

37d76f1e

Nov 30, 2011

Add UnescapeAndSplit unittest for multi-escapes · b39d17b1

Iustin Pop authored 13 years ago


This would have caught the bug in the first place. Argh,
hand-generated test cases!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b39d17b1

Fix a bug in command line option parsing code · 997f690f

Nikos Skalkotos authored 13 years ago


Fix bug affecting command line options of "keyval" type. Although
escaping commands with \ is supported, it is is not applied to the
input recursively.

Signed-off-by: Nikos Skalkotos <skalkoto@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

997f690f

Nov 24, 2011

ConfigWriter: Fix epydoc error · 1d4930b9

Michael Hanselmann authored 13 years ago


The parameter is called “mods”, not “modes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
(cherry picked from commit 1730d4a1)

1d4930b9

ConfigWriter: Fix epydoc error · 1730d4a1

Michael Hanselmann authored 13 years ago


The parameter is called “mods”, not “modes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

1730d4a1

LUGroupAssignNodes: Fix node membership corruption · 54c31fd3

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 218f4c3d)

54c31fd3

LUGroupAssignNodes: Fix node membership corruption · 218f4c3d

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

218f4c3d

Fix pylint warning on unreachable code · 9c4f4dd6

Michael Hanselmann authored 13 years ago


Commit c50452c3 added an exception when all instances should be
evacuated off a node, but did so in a way which made pylint complain
about unreachable code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9c4f4dd6

Nov 23, 2011

LUNodeEvacuate: Disallow migrating all instances at once · c50452c3

Michael Hanselmann authored 13 years ago


There is a design issue in the iallocator interface which prevents us
from doing this.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

c50452c3

LUNodeEvacuate: Locking fixes · 50722bfd

Michael Hanselmann authored 13 years ago


When evacuating a node, only an assertion without informative text was
used to check if the necessary node locks had been acquired. This was on
top of evaluating the list of nodes without having a node group lock, so
this was changed as well.

Also update some exception messages to include “retry the operation”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

50722bfd

Fix error when removing node · d05326fc

Michael Hanselmann authored 13 years ago


ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list.
Removing a node would fail with “too many values to unpack”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d05326fc

Nov 16, 2011

htools: rework message display construction · bdd8c739

Iustin Pop authored 13 years ago


While diagnosing some (unrelated) memory usage in htools, I've
stumbled upon some very bad behaviour in checkData: mapAccum is
non-strict, and the tuple we use also, so that results in the list of
list of messages being very bad space-wise (hundreds of MB of memory
for a simulated cluster with thousands of nodes, all with errors).

The new, explicit reuse of the old message list has a linear memory
behaviour. The only downside is that messages are listed in the
reverse order (which I'll fix on master).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bdd8c739

hbal: handle empty node groups · 2072221f

Iustin Pop authored 13 years ago


This patch changes an internal assert (which can only be triggered
when a node group is empty) into properly handling this case (and
returning empty node/instance lists).

While we could handle this in the backend (Cluster.splitNodeGroup)
this would actually mean than we change the behaviour for a cluster
with just two node groups, once of which is empty (where today we
don't require a node group argument).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2072221f

Nov 15, 2011

Document OpNodeMigrate's result for RAPI · 65c9591c

Michael Hanselmann authored 13 years ago


- Commit b7a1c816 changed the LU to generate jobs
- Mention documented results in NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

65c9591c

Nov 14, 2011

Ensure unused ports return to the free port pool · f396ad8c

Vangelis Koukis authored 13 years ago


Ensure ports previously allocated by calling ConfigWriter's AllocatePort() are
returned to the pool of free ports when no longer needed:

 * Return the network_port of an instance when it is removed
 * Return the port used by a DRBD-based disk when it is removed

Signed-off-by: Vangelis Koukis <vkoukis@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

f396ad8c

Re-wrap a paragraph to eliminate a sphinx warning · ca8f5622

Iustin Pop authored 13 years ago


This just makes sure that the paragraph doesn't contains lines that
start with :, which make Sphinx (1.0.7) complain.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

ca8f5622

Nov 08, 2011

Fail if node/group evacuation can't evacuate instances · d755483c

Michael Hanselmann authored 13 years ago


If an instance can't be evacuated, only a message would be printed. With
this change the operation always aborts. Newly added unittests check for
this behaviour.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d755483c

Nov 04, 2011

LUInstanceRename: Compare name with name · fb2865ae

Michael Hanselmann authored 13 years ago


… instead of object with name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

fb2865ae

LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode · cdde200a

Michael Hanselmann authored 13 years ago


Instances are modified if their disk size doesn't match.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cdde200a

Oct 27, 2011

Update NEWS for 2.5.0~rc4 · 95440548

Michael Hanselmann authored 13 years ago


I forgot this in the previous patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

95440548

Bump version to 2.5.0~rc4 · 7b612b95

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7b612b95

Merge branch 'stable-2.4' into stable-2.5 · dd228197

Michael Hanselmann authored 13 years ago


* stable-2.4:
  Update NEWS and increase to 2.4.5

Conflicts:
	configure.ac: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

dd228197

jqueue: Allow zero jobs to be submitted at once · 719f8fba

Michael Hanselmann authored 13 years ago


If cmdlib.LUNodeMigrate was called for a node without primary instances
it would try to submit an empty list of jobs. This was never visible via
CLI as there we check the list of primary instances first.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

719f8fba

Update NEWS and increase to 2.4.5 · 7b790a6a

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7b790a6a

Oct 26, 2011

hail: don't select the primary as new secondary · 07abe80a

Iustin Pop authored 13 years ago


This just adds the primary node of the instance as 'non-allocable'
during the choosing of the new secondary.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 7073b3a8)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

07abe80a

hail: add an extra safety check in relocate · e0baa26f

Iustin Pop authored 13 years ago


If we select the primary as new secondary, better to fail than return
wrong data to Ganeti.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit f25508be)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e0baa26f

Bump version to 2.5.0~rc3 · f39d39b6

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

f39d39b6

Oct 21, 2011

Merge branch 'devel-2.4' into stable-2.5 · 833391a0

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

833391a0

Oct 20, 2011

Fix queue archive creation with wrong permissions · 8e5a705d

René Nussbaumer authored 13 years ago


On a master failover some of the archive dirs might have wrong
permissions in the non-root model. This is due to the nature of noded
still running as root and the job queue is synced that way. This patch
will fix this behaviour by setting the permissions accordingly.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8e5a705d

Oct 19, 2011

Ensure permission on the job queue version file · 69f78cf7
René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
69f78cf7

OpGroupVerifyDisks: Fix wrong result type declaration · 6973587f

Michael Hanselmann authored 13 years ago


If an instance had actually a missing disk, the type check would fail.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6973587f

Oct 18, 2011

RAPI: Make node evacuation actually work · 0b58db81

Michael Hanselmann authored 13 years ago

Commit e1f23243 changed te LU and opcode for node evacuation to receive
a “mode” parameter (among other things). Commit de40437a changed the
RAPI code accordingly, but did so for an earlier version of the first
patch. Obviously this couldn't work, so here's the fix.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0b58db81