Commits · 1316ebc2429ef50d01ddead084aa444eb60ad97d · itminedu / snf-ganeti

Dec 21, 2011

jqueue: Fix epylint errors introduced in 37d76f1e · 1316ebc2
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
1316ebc2

jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e

Michael Hanselmann authored 13 years ago


When an opcode is about to be processed its dependencies are
evaluated using “_JobDependencyManager.CheckAndRegister”. Due
to its nature that function requires a lock on the manager's
internal structures. All of this happens while the job queue
lock is held in shared mode (required for the job processor).

When a job has been processed any pending dependencies are re-added
to the job workerpool. Before this patch that would require
the manager's lock and then, for adding the jobs, the job queue
lock. Since this is in reverse order it will lead to deadlocks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

37d76f1e

Nov 24, 2011

ConfigWriter: Fix epydoc error · 1d4930b9

Michael Hanselmann authored 13 years ago


The parameter is called “mods”, not “modes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>
(cherry picked from commit 1730d4a1)

1d4930b9

LUGroupAssignNodes: Fix node membership corruption · 54c31fd3

Michael Hanselmann authored 13 years ago


Note: This bug only manifests itself in Ganeti 2.5, but since the
problematic code also exists in 2.4, I decided to fix it there.

If a node was assigned to a new group using “gnt-group assign-nodes” the
node object's group would be changed, but not the duplicate member list
in the group object. The latter is an optimization to require fewer
locks for other operations. The per-group member list is only kept in
memory and not written to disk.

Ganeti 2.5 starts to make use of the data kept in the per-group member
list and consequently fails when it is out of date. The following
commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
confirmed using additional logging):

  $ gnt-group add foo
  $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
  $ gnt-cluster verify  # Fails with KeyError

This patch moves the code modifying node and group objects into
“config.ConfigWriter” to do the complete operation under the config
lock, and also to avoid making use of side-effects of modifying objects
without calling “ConfigWriter.Update”. A unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 218f4c3d)

54c31fd3

Fix pylint warning on unreachable code · 9c4f4dd6

Michael Hanselmann authored 13 years ago


Commit c50452c3 added an exception when all instances should be
evacuated off a node, but did so in a way which made pylint complain
about unreachable code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9c4f4dd6

Nov 23, 2011

LUNodeEvacuate: Disallow migrating all instances at once · c50452c3

Michael Hanselmann authored 13 years ago


There is a design issue in the iallocator interface which prevents us
from doing this.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

c50452c3

LUNodeEvacuate: Locking fixes · 50722bfd

Michael Hanselmann authored 13 years ago


When evacuating a node, only an assertion without informative text was
used to check if the necessary node locks had been acquired. This was on
top of evaluating the list of nodes without having a node group lock, so
this was changed as well.

Also update some exception messages to include “retry the operation”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

50722bfd

Fix error when removing node · d05326fc

Michael Hanselmann authored 13 years ago


ConfigWriter.GetAllInstancesInfo returns a dictionary, not a list.
Removing a node would fail with “too many values to unpack”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d05326fc

Nov 16, 2011

htools: rework message display construction · bdd8c739

Iustin Pop authored 13 years ago


While diagnosing some (unrelated) memory usage in htools, I've
stumbled upon some very bad behaviour in checkData: mapAccum is
non-strict, and the tuple we use also, so that results in the list of
list of messages being very bad space-wise (hundreds of MB of memory
for a simulated cluster with thousands of nodes, all with errors).

The new, explicit reuse of the old message list has a linear memory
behaviour. The only downside is that messages are listed in the
reverse order (which I'll fix on master).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bdd8c739

hbal: handle empty node groups · 2072221f

Iustin Pop authored 13 years ago


This patch changes an internal assert (which can only be triggered
when a node group is empty) into properly handling this case (and
returning empty node/instance lists).

While we could handle this in the backend (Cluster.splitNodeGroup)
this would actually mean than we change the behaviour for a cluster
with just two node groups, once of which is empty (where today we
don't require a node group argument).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2072221f

Nov 15, 2011

Document OpNodeMigrate's result for RAPI · 65c9591c

Michael Hanselmann authored 13 years ago


- Commit b7a1c816 changed the LU to generate jobs
- Mention documented results in NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

65c9591c

Nov 08, 2011

Fail if node/group evacuation can't evacuate instances · d755483c

Michael Hanselmann authored 13 years ago


If an instance can't be evacuated, only a message would be printed. With
this change the operation always aborts. Newly added unittests check for
this behaviour.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d755483c

Nov 04, 2011

LUInstanceRename: Compare name with name · fb2865ae

Michael Hanselmann authored 13 years ago


… instead of object with name.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

fb2865ae

LUClusterRepairDiskSizes: Acquire instance locks in exclusive mode · cdde200a

Michael Hanselmann authored 13 years ago


Instances are modified if their disk size doesn't match.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cdde200a

Oct 27, 2011

Update NEWS for 2.5.0~rc4 · 95440548

Michael Hanselmann authored 13 years ago


I forgot this in the previous patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

95440548

Bump version to 2.5.0~rc4 · 7b612b95

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7b612b95

Merge branch 'stable-2.4' into stable-2.5 · dd228197

Michael Hanselmann authored 13 years ago


* stable-2.4:
  Update NEWS and increase to 2.4.5

Conflicts:
	configure.ac: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

dd228197

jqueue: Allow zero jobs to be submitted at once · 719f8fba

Michael Hanselmann authored 13 years ago


If cmdlib.LUNodeMigrate was called for a node without primary instances
it would try to submit an empty list of jobs. This was never visible via
CLI as there we check the list of primary instances first.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

719f8fba

Update NEWS and increase to 2.4.5 · 7b790a6a

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7b790a6a

Oct 26, 2011

hail: don't select the primary as new secondary · 07abe80a

Iustin Pop authored 13 years ago


This just adds the primary node of the instance as 'non-allocable'
during the choosing of the new secondary.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 7073b3a8)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

07abe80a

hail: add an extra safety check in relocate · e0baa26f

Iustin Pop authored 13 years ago


If we select the primary as new secondary, better to fail than return
wrong data to Ganeti.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit f25508be)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e0baa26f

Bump version to 2.5.0~rc3 · f39d39b6

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

f39d39b6

Oct 21, 2011

Merge branch 'devel-2.4' into stable-2.5 · 833391a0

René Nussbaumer authored 13 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

833391a0

Oct 20, 2011

Fix queue archive creation with wrong permissions · 8e5a705d

René Nussbaumer authored 13 years ago


On a master failover some of the archive dirs might have wrong
permissions in the non-root model. This is due to the nature of noded
still running as root and the job queue is synced that way. This patch
will fix this behaviour by setting the permissions accordingly.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8e5a705d

Oct 19, 2011

Ensure permission on the job queue version file · 69f78cf7
René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
69f78cf7

OpGroupVerifyDisks: Fix wrong result type declaration · 6973587f

Michael Hanselmann authored 13 years ago


If an instance had actually a missing disk, the type check would fail.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6973587f

Oct 18, 2011

RAPI: Make node evacuation actually work · 0b58db81

Michael Hanselmann authored 13 years ago

Commit e1f23243 changed te LU and opcode for node evacuation to receive
a “mode” parameter (among other things). Commit de40437a changed the
RAPI code accordingly, but did so for an earlier version of the first
patch. Obviously this couldn't work, so here's the fix.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0b58db81

Bump version to 2.5.0~rc2 · a867283d

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

a867283d

Merge branch 'devel-2.4' into stable-2.5 · 16390b4b

Michael Hanselmann authored 13 years ago


* devel-2.4:
  Update NEWS for unreleased 2.4.5

Conflicts:
	NEWS: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

16390b4b

Update NEWS for unreleased 2.4.5 · ac0abc56

Michael Hanselmann authored 13 years ago


I need this for another 2.5 release.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

ac0abc56

Oct 17, 2011

RAPI: Fix resource for replacing disks · 539d65ba

Michael Hanselmann authored 13 years ago


Commit d1c172de inadvertently changes the
“/2/instances/[instance_name]/replace-disks” resource to use body
parameters. There were no QA tests and the issue wasn't noticed.

This patch re-introduces support for query parameters and adds a QA
test.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

539d65ba

Oct 12, 2011

Merge branch 'devel-2.4' into stable-2.5 · 58f6738c

Michael Hanselmann authored 13 years ago


* devel-2.4:
  rpc: Disable HTTP client pool and reduce memory consumption
  Fix assertion error on unclean master shutdown

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

58f6738c

rpc: Disable HTTP client pool and reduce memory consumption · 05927995

Michael Hanselmann authored 13 years ago

We noticed that “ganeti-masterd” can use large amounts of memory,
especially on large clusters. Measurements showed a single PycURL client
using about 500 kB of heap memory (the actual usage depends on versions,
build options and settings).

The RPC client uses a per-thread HTTP client pool with one client per
node. At this time there are 41 non-main threads (25 for the job queue
and 16 for client requests). This means the HTTP client pools use a lot
of memory (ca. 200 MB for 10 nodes, ca. 1 GB for 50 nodes).

This patch disables the per-thread HTTP client pool. No cleanup of
unused code is done. That will be done in the master branch only.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

05927995

Oct 07, 2011

hail: Fix result for node evacuation · 1ab94e48

Michael Hanselmann authored 13 years ago


According to the iallocator documentation the “node-evacuate” call needs
to return a list of jobs, not a list of lists of jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1ab94e48

Oct 04, 2011

Bump version to 2.5.0~rc1 · 07cea902

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

07cea902

Fix issue when verifying cluster files · 170b02b7

Michael Hanselmann authored 13 years ago


If a cluster has any non-master-candidate nodes, those don't contain all
files (e.g. config.data). With commit aef59ae7 (March 31st, 2011)
the logic was changed and subsequently verifying a cluster with non-mc
nodes would complain.

This patch fixes this issue by changing the algorithm. It also adds an
additional check for files which shouldn't exist on a machine. A newly
added unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

170b02b7

Oct 03, 2011

Revert "utils.log: Write error messages to stderr" · d728ac75

Michael Hanselmann authored 13 years ago


This reverts commit 34aa8b7c. Writing
error messages to stderr would also include backtraces, something we
tried to avoid in the past.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d728ac75

Fix adding nodes after commit · ca6b16e5

Michael Hanselmann authored 13 years ago


Commit 64c7b383 changed the RPC call for verifying SSH connections.
Unfortunately this case in adding nodes was missed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ca6b16e5

Sep 30, 2011

LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383

Michael Hanselmann authored 13 years ago


When verifying a group the code would always check SSH to all nodes in
the same group, as well as the first node for every other group. On big
clusters this can cause issues since many nodes will try to connect to
the first node of another group at the same time. This patch changes the
algorithm to choose a different node every time.

A unittest for the selection algorithm is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

64c7b383

Optimise cli.JobExecutor with many pending jobs · 11705e3d

Iustin Pop authored 13 years ago


In the case we submit many pending jobs (> 100) to the masterd, the
JobExecutor 'spams' the master daemon with status requests for the
status of all the jobs, even though in the end it will only choose a
single job for polling.

This is very sub-optimal, because when the master is busy processing
small/fast jobs, this query forces reading all the jobs from
this. Restricting the 'window' of jobs that we query from the entire
set to a smaller subset makes a huge difference (masterd only, 0s
delay jobs, all jobs to tmpfs thus no I/O involved):

- submitting/waiting for 500 jobs:
  - before: ~21 s
  - after:   ~5 s
- submitting/waiting for 1K jobs:
  - before: ~76 s
  - after:   ~8 s

This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
8s to 12s. I think that choosing the 'best' job for nice output only
matters with a small number of jobs, and that for more than that
people will not actually watch the jobs. So changing from 'perfect
job' to 'best job in the first 25' should be OK.

Note that most jobs won't execute as fast as 0 delay, but this is
still a good improvement.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

11705e3d