Commits · 6b6e335bad42dd6b62580b5c65f55ac72bbbf7b1 · itminedu / snf-ganeti

Oct 17, 2012

Group.hs: add 'allTags'; adjust loaders and test data for it · 6b6e335b

Dato Simó authored 12 years ago


This commit adds a Group.allTags field to store the tags of node groups,
and teaches each loader backend in HTools to populate it (additionally, the
IAllocator class in lib/cmdlib.py now includes tags for groups too). Test
data is updated to include an empty set of tags for node groups in all
affected test cases.

Signed-off-by: Dato Simó <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6b6e335b

Instance.hs: rename 'tags' to 'exclTags', provide 'allTags' · 2f907bad

Dato Simó authored 12 years ago

The mergeData function in Loader.hs included a step to filter an instance's
tags to include only the exclusion tags (as specified via the commandline,
or cluster-level tags). Later on, code in Node.hs assumed Instance.tags to
contain only tags to be used for exclusion.

Because in the future we will need to access the full list of an instance's
tags (and not only exclusion tags), this commits deprecates the 'tags'
field, and introduces Instance.exclTags and Instance.allTags.

Instance.allTags is now populated from the different backends (Text, Luxi,
Rapi, etc.), and Instance.exclTags is only populated from Loader.mergeData,
as was done previously. This means that loading tags from e.g. Text or Simu
and assuming that they'll be used as exclusion tags without going through
Loader.hs will no longer work; but this was already the case with other
fields, and 'mergeData' or 'loadExternalData' continue to be the only entry
points to get a consistent view of the cluster. (Additionally, there were
no tests that made this assumption that I could find.)

Signed-off-by: Dato Simó <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f907bad

htools-excl.test: add test case for exclusion tags in hbal · 0397694e

Dato Simó authored 12 years ago


In preparation for future modifications in the exclusion tags field, add a
test that verifies that exclusion tags are being honored: in a test cluster
with two instances of the same exclusion group in each node, hbal should
shuffle instances around to improve the score.

Signed-off-by: Dato Simó <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0397694e

Oct 16, 2012

ensure-dirs: Fix permissions on master socket · 48e3db76

Michael Hanselmann authored 12 years ago


A socket shouldn't have its executable bit set.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

48e3db76

Update security document for version 2.6 · 3397d13e

Michael Hanselmann authored 12 years ago


Quite some things were out of date. Some formatting was also updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3397d13e

Oct 15, 2012

Merge branch 'stable-2.6' into devel-2.6 · 4b945e1e

Michael Hanselmann authored 12 years ago


* stable-2.6:
  Update NEWS and bump version to 2.6.1

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

4b945e1e

Oct 12, 2012

Update NEWS and bump version to 2.6.1 · 27e15be0

Bernardo Dal Seno authored 12 years ago


This is a small bug-fix only release.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

27e15be0

Oct 11, 2012

Text.hs: update field lists in parseData comments · b0b8337a

Dato Simó authored 12 years ago


The comments in parseData had become out of date with the implementations
of load{Group,Node,Inst}. This commit updates the field list in comments to
match the implementations.

Signed-off-by: Dato Simó <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b0b8337a

Merge branch 'stable-2.6' into devel-2.6 · c44356fd

Michael Hanselmann authored 12 years ago


* stable-2.6:
  verify-disks: Explicitely state nothing has to be done
  Add list of design documents implemented in version 2.6
  Better list of replace-disks arguments + typos fixed
  jqueue: Look at archived jobs when watching
  Show old primary/secondary node on disk replacement
  gnt-instance reinstall: Don't always exit with success
  LUClusterVerify: Ignore /proc/drbd if DRBD is disabled
  Fixed typos in devnotes.rst
  Always_failover doesn't require --allow-failover anymore
  bash_completion: Enable extglob while parsing file
  rpc: Remove duplicated logic, fix unittests
  Annotate disk params on instance_start
  cmdlib: Handle locking.ALL_SET correctly when copying locks

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c44356fd

verify-disks: Explicitely state nothing has to be done · 9b99be28

Michael Hanselmann authored 12 years ago


Example output:
$ gnt-cluster verify-disks
Submitted jobs 4327
Waiting for job 4327 ...
No disks need to be activated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9b99be28

Oct 10, 2012

Add list of design documents implemented in version 2.6 · 40309ed7

Michael Hanselmann authored 12 years ago


Each version should have its dedicated list.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

40309ed7

Oct 05, 2012

Better list of replace-disks arguments + typos fixed · 50c1e351

Bernardo Dal Seno authored 12 years ago


The man page and the bultin-in help for gnt-instance replace-disks were
inconsistent. Also fixed some typos in man pages.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

50c1e351

jqueue: Look at archived jobs when watching · e4cf42d4

Michael Hanselmann authored 12 years ago


First: This enables the use of “gnt-job watch $id” for archived jobs.

Now, the reason for actually making this work is that during
sufficiently large group or node evacuations jobs are archived before
the client gets to poll for their output. This led to situations where
the jobs would finish successfully, but the client reported an error
because it couldn't see the job anymore.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
(cherry picked from commit 04569469)

e4cf42d4

Oct 03, 2012

Show old primary/secondary node on disk replacement · f0f8d060

Michael Hanselmann authored 12 years ago


People unfamiliar with Ganeti's internals might be confused with the
different hostnames showing up later in the process.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

f0f8d060

gnt-instance reinstall: Don't always exit with success · 64be07b1

Michael Hanselmann authored 12 years ago


If one or more jobs failed the exit status should be set accordingly.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

64be07b1

LUClusterVerify: Ignore /proc/drbd if DRBD is disabled · 2ef3383e

Michael Hanselmann authored 12 years ago


This fixes issue 190. The problem was that the check for DRBD was
enabled if LVM storage is used and didn't depend at all on whether DRBD
is enabled.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 3d8ae327)

2ef3383e

Oct 01, 2012

Fixed typos in devnotes.rst · 77865fb4

Gintautas Miliauskas authored 12 years ago


Signed-off-by: Gintautas Miliauskas <gintas@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

77865fb4

Sep 27, 2012

Always_failover doesn't require --allow-failover anymore · 320a5dae

Bernardo Dal Seno authored 12 years ago


If an administrator sets always_failover, it means that there is no need
for another explicit approval to failover instead of migrating.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit b5f0b5cc)

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

320a5dae

bash_completion: Enable extglob while parsing file · d163abf9

Michael Hanselmann authored 12 years ago

In older versions of GNU Bash extended patterns, such as “@(…)”, are only
available with the “extglob” shell option. As pointed out in [1] and [2],
“extglob” must be enabled while parsing the code. Therefore the flag must be
enabled at the beginning of the script and be reset to its original value at
the end as to not interfere with other code on shell initialization.

[1] http://unix.stackexchange.com/questions/45957
[2] http://mywiki.wooledge.org/glob



Reported by Sascha Lucas.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 893ad76d)

d163abf9

Sep 12, 2012

rpc: Remove duplicated logic, fix unittests · 0e2b7c58

Michael Hanselmann authored 12 years ago


Commit 5fce6a89 changed RpcRunner._InstDict to add the disk parameters
on all encoded instances. It didn't remove a special case in
“_InstDictOspDp”. Update and fix unittests as well.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

0e2b7c58

Annotate disk params on instance_start · 5fce6a89

Constantinos Venetsanopoulos authored 12 years ago


We call _GatherAndLinkBlockDevs during the process, which in turn
calls _RecursiveFindBD. This needs disk parameters to work.

See also commit b8291e00.

This was reported by Ansgar and Damien.

Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5fce6a89

cmdlib: Handle locking.ALL_SET correctly when copying locks · ef86bf28

Michael Hanselmann authored 12 years ago


When locks are copied “locking.ALL_SET” must be handled separately
(ALL_SET has the value None). Reported by Constantinos Venetsanopoulos
who saw failover for RDB-based instances not working.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

ef86bf28

Sep 07, 2012

Merge branch 'stable-2.6' into devel-2.6 · 99c7795a

Iustin Pop authored 12 years ago


* stable-2.6:
  Fix bug in non-mirrored instance allocation
  Fix gnt-debug iallocator

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

99c7795a

Fix bug in non-mirrored instance allocation · 14b5d45f

Iustin Pop authored 12 years ago


The function `allocateOnSingle' has a bug in the calculation of the
cluster score used for deciding which of the many target nodes to use
in placing the instance: it uses the original node list for the score
calculation.

Due to this, since the original node list is the same for all target
nodes, it means that basically `allocateOnSingle' returns the same
score, no matter the target node, and hence the choosing of the node
is arbitrary, instead of being done on the basis of the algorithm.

This has gone uncaught until reported because the unittests only test
1 allocation at a time on an empty cluster, and do not check the
consistency of the score. I'll send separate patches on the master
branch for adding more checks to prevent this in the future.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

14b5d45f

Sep 04, 2012

Fix gnt-debug iallocator · 09123222

René Nussbaumer authored 12 years ago


There was an issue with the recent ipolicy introduction which lead to a
bug in gnt-debug iallocator. It was not providing the spindle_use field
and therefore it wont let you create a valid iallocator request.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

09123222

Merge branch 'stable-2.6' into devel-2.6 · fa0003dc

Iustin Pop authored 12 years ago


* stable-2.6:
  Fix warnings/errors with newer pylint
  Fix decorator uses which crash newer pylint

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

fa0003dc

Sep 03, 2012

Fix warnings/errors with newer pylint · 8ad0da1e

Iustin Pop authored 12 years ago


To help developing Ganeti on newer distributions, let's try to fix
pylint warnings/errors. I'm using pylint from current Debian wheezy:
pylint 0.25.1, astng 0.23.1, common 0.58.0, and we have 3 things that
needs fixing.

First, a really wide "except", with the silencing in the wrong
place. I'm not sure why this doesn't have "except Exception", so let's
add it. However, pylint still complains about "Catching too general
exception", even though we do want to catch both system and our
exception, so let's add a silence for W0703. It's true that we
shouldn't catch KeyboardInterrupt and friends, but that should be
cleaned up on the master branch.

Second, pylint complains about "redefining name builtin tuple",
because we do some pattern matching in the except blocks in
netutils. This seems to be a false positive, but let's clean the code
around this.

And finally, type inference again goes bad, so let's silence E1103
with its "boolean doesn't have 'get' method".

After this, I can run "make lint", and by extension "make
commit-check" on Debian Wheezy, yay! We might be able to bump our
required pylint versions to something not ancient…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8ad0da1e

Fix decorator uses which crash newer pylint · fc3f75dd

Iustin Pop authored 12 years ago


Pylint version:

  pylint 0.25.1,
  astng 0.23.1, common 0.58.0

crashes when passing the fully-qualified decorator name with:

  File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 161, in visit_function
    if not redefined_by_decorator(node):
  File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 116, in redefined_by_decorator
    decorator.expr.name == node.name):
AttributeError: 'Getattr' object has no attribute 'name'

I found out that simply using a shortened name will 'fix' this issue,
so let's do this to allow running newer pylint versions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

fc3f75dd

Instance autorepair design · 68640987

Guido Trotter authored 12 years ago


This design describes a tool that will perform automatic repairs on
instances when they are detected to be unhealthy (living on offline or
drained nodes, at the moment). These repairs can be scheduled
automatically or requested as a one-off by a tool or person.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

68640987

Aug 27, 2012

Merge branch 'stable-2.6' into devel-2.6 · b5df6331

Iustin Pop authored 12 years ago


* stable-2.6:
  Make stable-2.6 compatible with newer pep8
  Fix computation of disk sizes in _ComputeDiskSize
  Add verification of RPC results in _WipeDisks
  Add test for checking that all gnt-* subcommands run OK
  Fix double use of PRIORITY_OPT in gnt-node migrate
  Add new Makefile target to rebuild the whole dist

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b5df6331

Make stable-2.6 compatible with newer pep8 · 2fefc557

Iustin Pop authored 12 years ago


This is done so that all current branches can run with newer pep8;
note that instead of fixing the problems (like I did on master), I've
just silenced more. These should *not* be merged onto master!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

2fefc557

Aug 22, 2012

Fix computation of disk sizes in _ComputeDiskSize · 6a3166cb

Constantinos Venetsanopoulos authored 12 years ago


Currently, hail fails with FailDisk when trying to add an instance
of type: 'file', 'sharedfile' and 'rbd'.

This is due to a "0" or None value in the corresponding dict inside
_ComputeDiskSize, which results in a "O" or non Int value of the
exported 'disk_space_total' parameter. This in turn makes hail fail,
when trying to process the value:

 - with "Unable to read Int" if value is None (file)
 - with FailDisk if value is 0 (sharedfile, rbd)

The latter happens because the 0 value doesn't match the instance's
IPolicy, since it is lower than the minimum disk size.

The second problem still exists when using adoption with 'plain'
and 'blockdev' template and will be addressed in another commit.

Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6a3166cb

Aug 15, 2012

Add verification of RPC results in _WipeDisks · f08e5132

Iustin Pop authored 12 years ago


Due to an oversight, the pause/resume sync RPC calls in _WipeDisks
lack the verification of the overall RPC status, and directly iterate
over the payload. The code actually doing the wipe does verify
correctly the results. This can result in jobs failing with a hard to
diagnose:

OpExecError ['NoneType' object is not iterable]

instead of proper "RPC failed" message.

This patch adds a hard check on the pause call, but for the resume
call it just logs a warning if the RPC failed; the rationale being
that if we can't contact the node for pausing the sync, it's likely
wiping will fail too, but after the wipe has been done, we can
continue.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f08e5132

Aug 10, 2012

Add test for checking that all gnt-* subcommands run OK · b2631ce4

Iustin Pop authored 12 years ago


This is a bit of a shell munging trickery, but works for now. Making
it more generic can be done later.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b2631ce4

Fix double use of PRIORITY_OPT in gnt-node migrate · 7db596df

Iustin Pop authored 12 years ago


This breaks the command, as optparse considers that an error.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7db596df

Aug 09, 2012

Add new Makefile target to rebuild the whole dist · 7213dded

René Nussbaumer authored 12 years ago


Due to the fact how the automake system works it doesn't rebuild already
prebuild files in distcheck. This lead to a bug, where a rebuild of the
documentation was failing because we missed the fact that the files were
missing from the archive.

By adding distrebuildcheck we workaround that issue by running a
maintainer-clean which also removes prebuild files.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7213dded

Aug 08, 2012

rapi client: accept arbitrary shutdown arguments · b8481ebf

Guido Trotter authored 12 years ago


The "ignore_offline_nodes" parameter is unsupported. Rather than
explicitely adding it, just pass all keyword arguments in the body of
the query, and rapi on the other side will do the right thing.

Supports for old arguments that were passed via the query is unchanged.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b8481ebf

Handle offline nodes for "instance down" checks · 1b9690aa

Guido Trotter authored 12 years ago


When offlining an instance because its primary node is down, we must be
able to cope with the situation.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1b9690aa

Add missing rst files to Makefile.am · a47dc554

René Nussbaumer authored 12 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a47dc554

Jul 27, 2012

Release version 2.6.0 (final) · d60d189a

Iustin Pop authored 12 years ago


Phew, it wasn't easy, but…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

d60d189a