Commits · 4c91d2ad7a4addc4c6e73f07c27456c9954b927e · itminedu / snf-ganeti

Oct 30, 2012

Iustin Pop authored 12 years ago


Commit 2c0af7da which added the runtime memory changes functionality
had a small typo (wrong name); I've rewritten this to only compute the
delta once, for simplicity.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0d324688

Fix removal of storage directory on shared file storage · e5dfc531

Iustin Pop authored 12 years ago


This patch makes _RemoveDisks symmetric to _CreateDisks with respect
to file-based storage: _CreateDisks uses "in constants.DTS_FILEBASED",
whereas _RemoveDisks was not update and only uses "==
constants.DT_FILE". This results in stale directories left on the
filesystem.

Fixes issue 262.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e5dfc531

Switch non-redundant check to disk template-based · 15361a18

Iustin Pop authored 12 years ago


Currently, the warning/notice about non-redundant instances in cluster
verify is based non empty secondaries list (how old is this?); the
proper way to check this nowadays is via DTS_MIRRORED.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

15361a18

Oct 19, 2012

Fix disk adoption interaction with ipolicy checks · ba147ff8

Iustin Pop authored 12 years ago


In Ganeti 2.6, disk adoption is broken due to the ipolicy checks being
done before we read volume size from remote nodes. We fix this by
simply moving these checks to after the disk adoption code which
updates the disk size; it's not that nice that we fail a (almost)
config-level check after we've reserved the LVs, etc., but we need to
do so in order to validate the ipolicy correctly.

Tested:

- normal instance creation
- creation via adoption with good size (pass)
- creation via adoption with wrong LV size (fail as expected)
- QA in progress

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ba147ff8

Oct 03, 2012

Show old primary/secondary node on disk replacement · f0f8d060

Michael Hanselmann authored 12 years ago


People unfamiliar with Ganeti's internals might be confused with the
different hostnames showing up later in the process.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

f0f8d060

LUClusterVerify: Ignore /proc/drbd if DRBD is disabled · 2ef3383e

Michael Hanselmann authored 12 years ago


This fixes issue 190. The problem was that the check for DRBD was
enabled if LVM storage is used and didn't depend at all on whether DRBD
is enabled.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
(cherry picked from commit 3d8ae327)

2ef3383e

Sep 27, 2012

Always_failover doesn't require --allow-failover anymore · 320a5dae

Bernardo Dal Seno authored 12 years ago


If an administrator sets always_failover, it means that there is no need
for another explicit approval to failover instead of migrating.

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
(cherry picked from commit b5f0b5cc)

Signed-off-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

320a5dae

Sep 12, 2012

cmdlib: Handle locking.ALL_SET correctly when copying locks · ef86bf28

Michael Hanselmann authored 12 years ago


When locks are copied “locking.ALL_SET” must be handled separately
(ALL_SET has the value None). Reported by Constantinos Venetsanopoulos
who saw failover for RDB-based instances not working.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

ef86bf28

Sep 04, 2012

Fix gnt-debug iallocator · 09123222

René Nussbaumer authored 12 years ago


There was an issue with the recent ipolicy introduction which lead to a
bug in gnt-debug iallocator. It was not providing the spindle_use field
and therefore it wont let you create a valid iallocator request.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

09123222

Sep 03, 2012

Fix warnings/errors with newer pylint · 8ad0da1e

Iustin Pop authored 12 years ago


To help developing Ganeti on newer distributions, let's try to fix
pylint warnings/errors. I'm using pylint from current Debian wheezy:
pylint 0.25.1, astng 0.23.1, common 0.58.0, and we have 3 things that
needs fixing.

First, a really wide "except", with the silencing in the wrong
place. I'm not sure why this doesn't have "except Exception", so let's
add it. However, pylint still complains about "Catching too general
exception", even though we do want to catch both system and our
exception, so let's add a silence for W0703. It's true that we
shouldn't catch KeyboardInterrupt and friends, but that should be
cleaned up on the master branch.

Second, pylint complains about "redefining name builtin tuple",
because we do some pattern matching in the except blocks in
netutils. This seems to be a false positive, but let's clean the code
around this.

And finally, type inference again goes bad, so let's silence E1103
with its "boolean doesn't have 'get' method".

After this, I can run "make lint", and by extension "make
commit-check" on Debian Wheezy, yay! We might be able to bump our
required pylint versions to something not ancient…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8ad0da1e

Aug 22, 2012

Fix computation of disk sizes in _ComputeDiskSize · 6a3166cb

Constantinos Venetsanopoulos authored 12 years ago


Currently, hail fails with FailDisk when trying to add an instance
of type: 'file', 'sharedfile' and 'rbd'.

This is due to a "0" or None value in the corresponding dict inside
_ComputeDiskSize, which results in a "O" or non Int value of the
exported 'disk_space_total' parameter. This in turn makes hail fail,
when trying to process the value:

 - with "Unable to read Int" if value is None (file)
 - with FailDisk if value is 0 (sharedfile, rbd)

The latter happens because the 0 value doesn't match the instance's
IPolicy, since it is lower than the minimum disk size.

The second problem still exists when using adoption with 'plain'
and 'blockdev' template and will be addressed in another commit.

Signed-off-by: Constantinos Venetsanopoulos <cven@grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6a3166cb

Aug 15, 2012

Add verification of RPC results in _WipeDisks · f08e5132

Iustin Pop authored 12 years ago


Due to an oversight, the pause/resume sync RPC calls in _WipeDisks
lack the verification of the overall RPC status, and directly iterate
over the payload. The code actually doing the wipe does verify
correctly the results. This can result in jobs failing with a hard to
diagnose:

OpExecError ['NoneType' object is not iterable]

instead of proper "RPC failed" message.

This patch adds a hard check on the pause call, but for the resume
call it just logs a warning if the RPC failed; the rationale being
that if we can't contact the node for pausing the sync, it's likely
wiping will fail too, but after the wipe has been done, we can
continue.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

f08e5132

Jul 26, 2012

Fix issue in LUClusterVerifyGroup with multi-group clusters · 350506c6

Iustin Pop authored 12 years ago


In case LUClusterVerifyGroup is run on a group which doesn't contain
the master node, the following could happen:

- master node is selected due to the explicit check
- if the order of nodes in the 'absent_nodes' list is such that the
  master node is the first in it, then we'll select (again) the master
  node
- passing duplicate nodes to RPC calls will break due to RPC
  internals; this should be fixed separately, but in the meantime we
  just refrain from passing such duplicates

This patch should not change the semantics of the code, since it
wasn't guaranteed even before that we find a vm_capable node.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>

350506c6

Fix node group modification of node parameters · 4bf27dab

Iustin Pop authored 12 years ago


Commit 904b3bfe tried to fix the deletion of custom ndparams from
group, but instead broke both modification and deletion: because we
run ForceDictType on self.op.ndparams instead of the updated
new_ndparams, we can neither delete nor set properly spindle_count
(since it won't be coerced to int).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

4bf27dab

Jul 19, 2012

Fix setting ipolicy on node groups · 8b057218

René Nussbaumer authored 12 years ago


On node groups we don't have the std field. However, the InstancePolicy
object always verifies that the std value is within a given range. As we
fill it up with defaults if not set (as it happens to be on node groups)
and the min value is higher than the default std value (taken from
constants.py) we fail.

We overcome this situation by simply let the function know if we want to
verify the std value at all. If we don't want to verify std, we just set
it to a compliant value (min_v) and continue.

We also slightly adapt the error message provided, as we don't have std
values on groups.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8b057218

Jul 13, 2012

Allow reinstall even when secondaries are offline · 96c3d5d4

René Nussbaumer authored 12 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

96c3d5d4

Jul 11, 2012

Ignore offline node errors when removing disks · 03e5cdd5

Agata Murawska authored 12 years ago


When we delete DRBD disks from some instance, we do not want to get
errors due to nodes other than that instance's primary being offline.

Signed-off-by: Agata Murawska <agatamurawska@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

03e5cdd5

Jul 07, 2012

Allow instance disc activation with offline secondaries · d908ba61

Iustin Pop authored 12 years ago


Currently, this is not allowed, so one can't run a replace-disks; this
breaks any non-invasive method of recovering the redundancy of the
instance if its disks are already stopped (but it still works if the
disks on the primary are active). So let's fix this inconsistency.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

d908ba61

Jul 05, 2012

Fix redistribution of files w.r.t. offline nodes · cc706abc

Iustin Pop authored 12 years ago


Currently, _RedistributeAncillaryFiles computes two lists: the list of
online nodes (for all files redistribution), and the list of
vm_capable nodes, for hypervisor-specific files. However, the
vm_capable list includes offline nodes too, leading to warning
messages:

  WARNING: Copy of file /etc/xen/xend-config.sxp to node node13.example.com failed: Node is marked offline

We fix this by trivially intersecting the vm_capable list with the
online one.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Bernardo Dal Seno <bdalseno@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

cc706abc

Fix cluster verify error on master-ip-setup script · 770461fe

René Nussbaumer authored 12 years ago


This error does not show up until we exceed the pool of master
candidates and have nodes which are not master candidates.

The background is that we check for master-ip-setup script on master
candidates and expect them not to be on the other nodes. However, we
distribute a default master-ip-script which break this assumption.
Furthermore, there's no reason why the file should just exists on the
master candidates.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

770461fe

Jun 27, 2012

Annotate disks upon blockdev_shutdown · 55de1d68

René Nussbaumer authored 12 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

55de1d68

Annotate disks on blockdev_remove · 4504bfcb

René Nussbaumer authored 12 years ago


This annotates the disks for the blockdev_remove where it is
appropriate. It leaves out 2 cases were we can't reliably annotate disk
parameters due to lack of knowledge what we should annotate. Those cases
affects only lvs used for drbd, so it doesn't affect the bug reported by
Constantinos.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

4504bfcb

Annotate disk params on blockdev_getmirrorstatus_multi · b5cbddd9

René Nussbaumer authored 12 years ago


This is also related to the bug reported by Constantinos,
as we've only one getmirrorstatus_multi call in whole cmdlib, we just
annotate them while we are building the disk list.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

b5cbddd9

Annotate disk parameters on blockdev_getmirrorstatus · 70817cee

René Nussbaumer authored 12 years ago


Not annotating them works for DRBD but not for RBD as reported by
Constantinos.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Agata Murawska <agatamurawska@google.com>

70817cee

Jun 20, 2012

Fix bug in instance net changes · 80b898f9

Iustin Pop authored 12 years ago


_PrepareNicModification returns the invalid type, which triggers an
assert resulting in a mysterious error:

Failure: command execution error:

Without any explanation. We fix this by removing the return value from
_PrepareNicModification, and instead returning the expected type
(since it differs per create/modification) from the (existing)
wrappers for this function. We don't need to return actual changes
from this function as _ApplyNicMods is the function that
computes/returns the formatted changes.

Signed-off-by: Iustin Pop <iustin@google.com>
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

80b898f9

Jun 15, 2012

Verify the options on diskparameters · e4a4391d

René Nussbaumer authored 13 years ago


This prevents from setting for example drbd options on the plain disk
template.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e4a4391d

Jun 14, 2012

Fix creation of plain instances with --no-wait-for-sync · d8960502

Iustin Pop authored 13 years ago


As reported on the devel mailing list by Christos Stavrakakis,
creation of plain instances is broken when the --no-wait-for-sync flag
is passed, because in that case WaitForSync is not called, hence
SetDiskID is not called at all, resulting in a None physical_id being
passed to backend.

We fix that by explicitly calling SetDiskID, which will cover the
pause/resume and os_add RPC calls.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d8960502

Jun 08, 2012

Improve error message for auto-promote/node modify · b59092f7

Iustin Pop authored 13 years ago


This has been reported internally 3-4 times already, and the current
version (from 8b437a6e) is still not good enough, it seems.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b59092f7

Jun 01, 2012

Fix a type issue and bad logic in cluster verification · e375fb61

Iustin Pop authored 13 years ago


Commit 2e04d454 introduced the new offline state for the instance
state, but being a big monolithic patch it sneaked in something that
doesn't make sense.

The checks for extra instances (either wrongly up or just unknown) are
done purely on a name-basis, not on objects, so the types there are
wrong. Furthermore, they have no relation to the admin state of the
instance, so we just drop the entire if block. We keep the increment
of the offline instance count, but move it to a different loop over
instances.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e375fb61

May 22, 2012

Make it possible to reset vcpu/spindle ratio to default · cd415612
René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
cd415612

Add man page documentation for cpu_mask hv parameter · ff39194f

Iustin Pop authored 13 years ago


This is adapted from the design doc.

Also fixes a typo in cmdlib.py.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ff39194f

May 15, 2012

Beautify a couple of error messages · ea0f78c8

Iustin Pop authored 13 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ea0f78c8

Fix _ComputeNewInstanceViolations logic · 0fd5547a

Iustin Pop authored 13 years ago


This function did the opposite: was computing which old instance
violated the specs but no longer do it now. new - old is the expected
behaviour.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0fd5547a

Beautify disk ipolicy violations in cluster-verify · 0c2e59ac

Iustin Pop authored 13 years ago


Currently, we only get:

  instance3: ['disk-size value 512 is not in range [1024, 1048576]'

which doesn't explain which disk we are talking about. This patch
extends the verification functions to take an additional parameter
that qualifies the disk:

  instance3: ['disk-size/0 value 512 is not in range [1024, 1048576]'

Future patch will make the formatting of the list better.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0c2e59ac

May 14, 2012

LUInstanceCreate: Run rename script on instance import · e78a6817

Michael Hanselmann authored 13 years ago


If an instance is imported with a different name, network settings may have to
be changed. Since import scripts may not already to the right thing, we decided
to run the rename script. The same technique is already used for inter-cluster
instance moves.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e78a6817

gnt-group add: Fix diskparam fill · 7228ca91

René Nussbaumer authored 13 years ago


This was a pretty non-obvious bug. A cluster looks sane after
gnt-cluster init, however on a daemon restart the diskparameters had the
default filled in. The same applies to gnt-group add. This is due to the
nature that UpgradeConfig() from NodeGroups did just populate them with
defaults if something was set on it.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7228ca91

gnt-group modify: Fix an update issue with diskparams · b3230b32
René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
b3230b32

May 11, 2012
- query: Expose diskparamters through query · 2c758845
  René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  2c758845
- gnt-cluster info: Print and format disk parameters · f9bbf32b
  René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  f9bbf32b
May 10, 2012
- apidoc: Fix some typos and errors introduced by my previous patches · af9fb4cc
  René Nussbaumer authored 13 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  af9fb4cc