Commits · 0e632cbdfe2e1e59879e5f2289a805deea470482 · itminedu / snf-ganeti

Nov 08, 2012

http: Add wrapper for mimetools.Message · 0e632cbd

Michael Hanselmann authored 12 years ago


A newly added piece of code will also have to parse headers, so having
this wrapper saves us from copying this part of code.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0e632cbd

Nov 07, 2012

Add missing tests for commit · 7a70541e

Michael Hanselmann authored 12 years ago


Commit f0d22861 changed the logic of
gnt_instance._ConvertNicDiskModifications to also allow a parameter
named “modify”. Unfortunately the corresponding unittest was not
updated. An “if”/“else” condition is also merged.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7a70541e

Nov 06, 2012

Fix gnt-instance console with xl · 1f5557ca

Guido Trotter authored 12 years ago


- Rename xm-console-wrapper to xen-console-wrapper
- Pass the xen command to use as a parameter

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1f5557ca

Disable E1101 on ganeti/http/server.py:424 · 57a6042e

Guido Trotter authored 12 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

57a6042e

Fix live migration under xl · 053c356a

Guido Trotter authored 12 years ago


Until now the only way to make live migration work in conjunction with
"xl" was to add ssh known_hosts keys for every node's secondary ip on
every other node.

With this command we remove the target key verification: this is not
worse than what we were doing before with "xm", and allows the migration
to happen under either toolstack, without extra manual work. Of course
the full security of ssh is not used by live migration, then.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

053c356a

Don't check for xend port when using xl · 3135de69

Guido Trotter authored 12 years ago


If the toolstack is set to "xl" we shouldn't ping xend for liveness
before attempting a live migration.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3135de69

Nov 01, 2012

jqueue: Return jobs to queue when shutting down · 942e2262

Michael Hanselmann authored 12 years ago


When a job is still waiting for locks and the queue is shutting down,
they should be returned and not actually start processing. Until now
jobs which transitioned from “queued” to “waiting” were already
considered to be running as far as the shutdown code was concerned.

This fixes issue 296.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

942e2262

gnt-debug delay: Add "--submit" option · bb600388

Michael Hanselmann authored 12 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bb600388

Make hostname checks uniform between instance rename and add · 233f4bc6

Iustin Pop authored 12 years ago


Currently, we have instance rename doing extra checks on the host
name, to prevent accidental wrong renames; however, instance create
doesn't do these checks (issue 291), which (if DNS is misconfigured)
can lead to hard to diagnose errors.

This patch abstracts the name checking from LUInstanceRename into a
separate function, which is then reused in both instance rename and
instance create.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

233f4bc6

Improve logging of new job submissions · 4c91d2ad

Iustin Pop authored 12 years ago


This addresses issue 290: when receiving new jobs, logging is
incomplete, and we don't have the job ID(s) and/or summaries
logged. Only later, when the job is queried for or being processed, we
know more.

This is not good when troubleshooting, so let's improve the initial
logging.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

4c91d2ad

Improve handling of lock exceptions · a9d40c65

Iustin Pop authored 12 years ago


There are two issues with lock exceptions right now:

- first, we don't log the original error; this is fine for now
  (locking.py always returns the same error here), but in general is
  brittle: if locking.py would start returning more information, we'd
  completely miss that

- second, an actual honest lock conflict is not an internal error;
  it's simply an optimistic lock failing, and as such we should not
  return internal error, but rather resource_not_unique

This addresses issue 287.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

a9d40c65

Oct 30, 2012

Fix runtime memory increases · 0d324688

Iustin Pop authored 12 years ago


Commit 2c0af7da which added the runtime memory changes functionality
had a small typo (wrong name); I've rewritten this to only compute the
delta once, for simplicity.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0d324688

Fix validation of vgname in OpClusterSetParams · 8e66b9bf

Iustin Pop authored 12 years ago


This variable can be empty, when we want to disable LVM, so we can't
use TMaybeString.

Fixes issue 285.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

8e66b9bf

Fix removal of storage directory on shared file storage · e5dfc531

Iustin Pop authored 12 years ago


This patch makes _RemoveDisks symmetric to _CreateDisks with respect
to file-based storage: _CreateDisks uses "in constants.DTS_FILEBASED",
whereas _RemoveDisks was not update and only uses "==
constants.DT_FILE". This results in stale directories left on the
filesystem.

Fixes issue 262.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e5dfc531

Switch non-redundant check to disk template-based · 15361a18

Iustin Pop authored 12 years ago


Currently, the warning/notice about non-redundant instances in cluster
verify is based non empty secondaries list (how old is this?); the
proper way to check this nowadays is via DTS_MIRRORED.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15361a18

Oct 29, 2012

Add option to force master-failover without voting · 6022a419

Iustin Pop authored 12 years ago


This fixes issue 282.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

6022a419

Oct 26, 2012

Update instance modify message · d976957d

Iustin Pop authored 12 years ago


Currently the message does not say explicitly that instance-initiated
reboots are useless to trigger the use of new parameters, per the
thread on the user mailing list. Let's improve it a bit.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michele Tartara <mtartara@google.com>

d976957d

Oct 19, 2012

Fix disk adoption interaction with ipolicy checks · ba147ff8

Iustin Pop authored 12 years ago


In Ganeti 2.6, disk adoption is broken due to the ipolicy checks being
done before we read volume size from remote nodes. We fix this by
simply moving these checks to after the disk adoption code which
updates the disk size; it's not that nice that we fail a (almost)
config-level check after we've reserved the LVs, etc., but we need to
do so in order to validate the ipolicy correctly.

Tested:

- normal instance creation
- creation via adoption with good size (pass)
- creation via adoption with wrong LV size (fail as expected)
- QA in progress

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ba147ff8

Oct 11, 2012

verify-disks: Explicitely state nothing has to be done · 9b99be28

Michael Hanselmann authored 12 years ago


Example output:
$ gnt-cluster verify-disks
Submitted jobs 4327
Waiting for job 4327 ...
No disks need to be activated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9b99be28

Oct 05, 2012

Better list of replace-disks arguments + typos fixed · 50c1e351

Bernardo Dal Seno authored 12 years ago


The man page and the bultin-in help for gnt-instance replace-disks were
inconsistent. Also fixed some typos in man pages.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

50c1e351

jqueue: Look at archived jobs when watching · e4cf42d4

Michael Hanselmann authored 12 years ago


First: This enables the use of “gnt-job watch $id” for archived jobs.

Now, the reason for actually making this work is that during
sufficiently large group or node evacuations jobs are archived before
the client gets to poll for their output. This led to situations where
the jobs would finish successfully, but the client reported an error
because it couldn't see the job anymore.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
(cherry picked from commit 04569469)

e4cf42d4

Oct 03, 2012

Show old primary/secondary node on disk replacement · f0f8d060

Michael Hanselmann authored 12 years ago


People unfamiliar with Ganeti's internals might be confused with the
different hostnames showing up later in the process.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

f0f8d060

gnt-instance reinstall: Don't always exit with success · 64be07b1

Michael Hanselmann authored 12 years ago


If one or more jobs failed the exit status should be set accordingly.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

64be07b1

LUClusterVerify: Ignore /proc/drbd if DRBD is disabled · 2ef3383e

Michael Hanselmann authored 12 years ago


This fixes issue 190. The problem was that the check for DRBD was
enabled if LVM storage is used and didn't depend at all on whether DRBD
is enabled.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 3d8ae327)

2ef3383e

Sep 27, 2012

Always_failover doesn't require --allow-failover anymore · 320a5dae

Bernardo Dal Seno authored 12 years ago


If an administrator sets always_failover, it means that there is no need
for another explicit approval to failover instead of migrating.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit b5f0b5cc)

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

320a5dae

Sep 12, 2012

rpc: Remove duplicated logic, fix unittests · 0e2b7c58

Michael Hanselmann authored 12 years ago


Commit 5fce6a89 changed RpcRunner._InstDict to add the disk parameters
on all encoded instances. It didn't remove a special case in
“_InstDictOspDp”. Update and fix unittests as well.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

0e2b7c58

Annotate disk params on instance_start · 5fce6a89

Constantinos Venetsanopoulos authored 12 years ago


We call _GatherAndLinkBlockDevs during the process, which in turn
calls _RecursiveFindBD. This needs disk parameters to work.

See also commit b8291e00.

This was reported by Ansgar and Damien.

Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5fce6a89

cmdlib: Handle locking.ALL_SET correctly when copying locks · ef86bf28

Michael Hanselmann authored 12 years ago


When locks are copied “locking.ALL_SET” must be handled separately
(ALL_SET has the value None). Reported by Constantinos Venetsanopoulos
who saw failover for RDB-based instances not working.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

ef86bf28

Sep 04, 2012

Fix gnt-debug iallocator · 09123222

René Nussbaumer authored 12 years ago


There was an issue with the recent ipolicy introduction which lead to a
bug in gnt-debug iallocator. It was not providing the spindle_use field
and therefore it wont let you create a valid iallocator request.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

09123222

Sep 03, 2012

Fix warnings/errors with newer pylint · 8ad0da1e

Iustin Pop authored 12 years ago


To help developing Ganeti on newer distributions, let's try to fix
pylint warnings/errors. I'm using pylint from current Debian wheezy:
pylint 0.25.1, astng 0.23.1, common 0.58.0, and we have 3 things that
needs fixing.

First, a really wide "except", with the silencing in the wrong
place. I'm not sure why this doesn't have "except Exception", so let's
add it. However, pylint still complains about "Catching too general
exception", even though we do want to catch both system and our
exception, so let's add a silence for W0703. It's true that we
shouldn't catch KeyboardInterrupt and friends, but that should be
cleaned up on the master branch.

Second, pylint complains about "redefining name builtin tuple",
because we do some pattern matching in the except blocks in
netutils. This seems to be a false positive, but let's clean the code
around this.

And finally, type inference again goes bad, so let's silence E1103
with its "boolean doesn't have 'get' method".

After this, I can run "make lint", and by extension "make
commit-check" on Debian Wheezy, yay! We might be able to bump our
required pylint versions to something not ancient…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8ad0da1e

Fix decorator uses which crash newer pylint · fc3f75dd

Iustin Pop authored 12 years ago


Pylint version:

  pylint 0.25.1,
  astng 0.23.1, common 0.58.0

crashes when passing the fully-qualified decorator name with:

  File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 161, in visit_function
    if not redefined_by_decorator(node):
  File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 116, in redefined_by_decorator
    decorator.expr.name == node.name):
AttributeError: 'Getattr' object has no attribute 'name'

I found out that simply using a shortened name will 'fix' this issue,
so let's do this to allow running newer pylint versions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

fc3f75dd

Aug 22, 2012

Fix computation of disk sizes in _ComputeDiskSize · 6a3166cb

Constantinos Venetsanopoulos authored 12 years ago


Currently, hail fails with FailDisk when trying to add an instance
of type: 'file', 'sharedfile' and 'rbd'.

This is due to a "0" or None value in the corresponding dict inside
_ComputeDiskSize, which results in a "O" or non Int value of the
exported 'disk_space_total' parameter. This in turn makes hail fail,
when trying to process the value:

 - with "Unable to read Int" if value is None (file)
 - with FailDisk if value is 0 (sharedfile, rbd)

The latter happens because the 0 value doesn't match the instance's
IPolicy, since it is lower than the minimum disk size.

The second problem still exists when using adoption with 'plain'
and 'blockdev' template and will be addressed in another commit.

Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6a3166cb

Aug 15, 2012

Add verification of RPC results in _WipeDisks · f08e5132

Iustin Pop authored 12 years ago


Due to an oversight, the pause/resume sync RPC calls in _WipeDisks
lack the verification of the overall RPC status, and directly iterate
over the payload. The code actually doing the wipe does verify
correctly the results. This can result in jobs failing with a hard to
diagnose:

OpExecError ['NoneType' object is not iterable]

instead of proper "RPC failed" message.

This patch adds a hard check on the pause call, but for the resume
call it just logs a warning if the RPC failed; the rationale being
that if we can't contact the node for pausing the sync, it's likely
wiping will fail too, but after the wipe has been done, we can
continue.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f08e5132

Aug 10, 2012

Fix double use of PRIORITY_OPT in gnt-node migrate · 7db596df

Iustin Pop authored 12 years ago


This breaks the command, as optparse considers that an error.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

7db596df

Jul 27, 2012

Fix 'explicitely' common typo · 2ed0e208

Iustin Pop authored 12 years ago


It seems that 'explicitely' is wrong, and that the right form is
'explicitly'. This is just fixing the typo plus adjusting affected
paragraphs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

2ed0e208

Jul 26, 2012

Fix issue in LUClusterVerifyGroup with multi-group clusters · 350506c6

Iustin Pop authored 12 years ago


In case LUClusterVerifyGroup is run on a group which doesn't contain
the master node, the following could happen:

- master node is selected due to the explicit check
- if the order of nodes in the 'absent_nodes' list is such that the
  master node is the first in it, then we'll select (again) the master
  node
- passing duplicate nodes to RPC calls will break due to RPC
  internals; this should be fixed separately, but in the meantime we
  just refrain from passing such duplicates

This patch should not change the semantics of the code, since it
wasn't guaranteed even before that we find a vm_capable node.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

350506c6

Fix node group modification of node parameters · 4bf27dab

Iustin Pop authored 12 years ago


Commit 904b3bfe tried to fix the deletion of custom ndparams from
group, but instead broke both modification and deletion: because we
run ForceDictType on self.op.ndparams instead of the updated
new_ndparams, we can neither delete nor set properly spindle_count
(since it won't be coerced to int).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

4bf27dab

Jul 24, 2012

Fix boot=on flag for CDROMs · 24be50e0

Iustin Pop authored 12 years ago


This generalises commit 4304964a to cdroms too, since they have
somewhat the same logic. We just abstract the needs_boot_flag into a
separate variable, and then reuse it in the cdrom section.

Note that the logic of what 'if=' type to pass to KVM was very
convoluted, and (I think) incorrect; I went and cleaned it to be more
consistent.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

24be50e0

KVM: only pass boot flag once · 2b846304

Iustin Pop authored 12 years ago


This addresses issue 230: passing two methods of booting to KVM can,
depending on the KVM version, confuse it.

Note that commit 4304964a introduced a partial fix for this (but only
for disks, and keyed on KVM versions). However, it didn't fix cdrom
booting, which still fails with the same error, so let's fix it more
generically; we still leave the per-disk check since that is about
-boot c versus -drive …,boot=on rather than two boot methods.

Patch is based on the one submitted by Vladimir Mencl, many thanks!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

2b846304

Jul 19, 2012

Fix setting ipolicy on node groups · 8b057218

René Nussbaumer authored 12 years ago


On node groups we don't have the std field. However, the InstancePolicy
object always verifies that the std value is within a given range. As we
fill it up with defaults if not set (as it happens to be on node groups)
and the min value is higher than the default std value (taken from
constants.py) we fail.

We overcome this situation by simply let the function know if we want to
verify the std value at all. If we don't want to verify std, we just set
it to a compliant value (min_v) and continue.

We also slightly adapt the error message provided, as we don't have std
values on groups.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8b057218