Commits · 563725736204c55de6bcc0e4e149d89a45854aa2 · itminedu / snf-ganeti

May 20, 2011

Update hooks.rst for cluster verify changes · 56372573

Guido Trotter authored 13 years ago


Also update NEWS on this change.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

56372573

Fix a couple of style mistakes · e0508c86

Guido Trotter authored 13 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e0508c86

Cluster verify: accept a --node-group option · 40167d65

Adeodato Simo authored 13 years ago


This will trigger a ClusterVerifyGroup operation only on the specified
group, skipping other groups as well as cluster-wide verifications.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

40167d65

Cluster verify: check for nodes/instances with no group · adfa3b26

Adeodato Simo authored 13 years ago


Previously, all nodes and instances would *always* be visited/verified. By
driving the verification by node group now, we will miss nodes and
instances that can't be reached from existing node groups, should that rare
and bogus circumstance ever occur.

We safeguard against that by checking for unreachable nodes and instances
explicitly. (These will not be further verified.)

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

adfa3b26

Cluster verify: fix LV checks for split instances · fe870648

Adeodato Simo authored 13 years ago


When sharding by group, if a mirrored instance is split (primary and
secondary) between two groups, its volumes will not be properly checked:
the group of the primary will warn about a missing volume in the secondary,
and the group of the secondary about an unknown volume (in the secondary as
well).

To solve the "missing volumes" bit, we will detect this case and perform an
extra RPC verify call to these split secondaries (querying only for
NV_LVLIST), and introduce the results in the node images appropriately. We
do this detection early in ExpandNames/CheckPrereq, as to properly lock the
extra nodes.

As for the "unknown volumes" warning in the secondary, we update the volume
mapping with split instances before checking for orphaned volumes.

Finally, we mark nodes as "ghost" only if they really don't exist in the
cluster configuration, which avoid spurious "instance lives in ghost node"
warnings.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

fe870648

Cluster verify: make NV_NODELIST smaller · 2dad1652

Adeodato Simo authored 13 years ago


To cope with increasing cluster sizes, we now make nodes try to contact all
other nodes in their group, and one node from every other group.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2dad1652

Cluster verify: verify hypervisor parameters only once · d23a2a9d

Adeodato Simo authored 13 years ago


The list of all hypervisor parameters has to be computed in
LUClusterVerifyGroup, since it needs to be passed to nodes as
NV_HVPARAMS. However, it is better only to verify said parameters once,
out of LUClusterVerifyConfig.

For this, we refactor the code that constructs the list of parameters to a
module-level _GetAllHypervisorParameters() function that both LUs can use.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d23a2a9d

Split LUClusterVerify into LUClusterVerify{Config,Group} · bf93ae69

Adeodato Simo authored 13 years ago

With this change, LUClusterVerifyConfig becomes a "light" LU that only
verifies the global config and other, master-only settings, and the bulk of
node/instance verification is done by LUClusterVerifyGroup, which only acts
on nodes and instances of a given group.

To ensure that `gnt-cluster verify` continues to operate on the whole
cluster, the client creates an OpClusterVerifyGroup job per node group; for
convenience, the list of node groups is returned by LUClusterVerifyConfig.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bf93ae69

Cluster verify: factor out error codes and functions · a5c30dc2

Adeodato Simo authored 13 years ago


We move all error code definitions, plus the _Error and _ErrorIf helpers,
to a private _VerifyErrors mix-in class that can be later shared by the new
two cluster verify LUs.

(_Error and _ErrorIf code was moved around verbatim, except to disable
"_VerifyError class does not have 'op' or '_feedback_fn' members" errors
from pylint.)

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a5c30dc2

Cluster verify: make "instance runs in wrong node" node-driven · 14970c32

Adeodato Simo authored 13 years ago


Previously, the "instance should not be running in this node" error was
computed by verifying, for each instance, whether any node other than its
primary was running it. But this is not a well-suited approach if we were
to shard cluster verification (because, for each instance, we won't have
information whether it's running *outside* the current set of nodes).

By reversing the logic of the check, and asking instead, for each node,
"is it running any instance for which it's not primary", we catch all
occurrences of the problem even if running sharded.

Because of this, we can also detect orphan instances at the same time
(instances that are not known in the cluster config). We warn about them
here too, and drop the later _VerifyOrphanInstances check.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

14970c32

Verify an absent vm_capable node for files · 4e272d8c

Guido Trotter authored 13 years ago


If we're not verifying all nodes, adding a node outside the current
group for file checksums helps us making sure checksums are the same in
all of the cluster.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4e272d8c

Cluster verify: master must be present for _VerifyFiles · 2f10179b

Adeodato Simo authored 13 years ago


This commit prepares the call to _VerifyFiles for the case when the master
node is not one of the nodes that's being verified (which will be the case
for all node groups but one). We fix it by always passing master info and
checksums to _VerifyFiles, which ensures there's a cluster-wide consistency
check.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f10179b

Cluster verify: don't assume we're verifying all nodes/instances · cf692cd0

Adeodato Simo authored 13 years ago


This commit fixes a few initial simple cases in which it was assumed that
we're always working over the whole cluster. With this change, we
differentiate between "nodes/instances to verify" and "checks that need
cluster-wide information".

In particular:

  - retrieve hypervisor parameters always from all instances
  - always specify full node list in NV_NODELIST
  - retrieve OOB path from all nodes
  - verify DRBD devices against the full set of instances (this ensures
    minors get properly verified even if an instance is split between groups)
  - look up node groups against the set of all nodes (to avoid tracebacks
    in case instances are split between groups)
  - determine whether running instances are unknown by checking against the
    full list of instances

Behavior in all cases stays the same if still running over the whole
cluster.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cf692cd0

Cluster verify: gather node/instance list in CheckPrereq · c711d09e

Adeodato Simo authored 13 years ago


This commit introduces no behavior changes, and is only a minor refactoring
that aids with a cleaner division of future LUClusterVerify work. The
change consists in:

  - substitute the {node,instance}{list,info} structures previously created
    in Exec() by member variables created in CheckPrereq; and

  - mechanically convert all references to the old variables to the new
    member variables.

Creating both self.all_{node,inst}_info and self.my_{node,inst}_info, both
with the same contents at the moment, is not capricious. We've now made
Exec use the my_* variables pervasively; in future commits, we'll break the
assumption that all nodes and instances are listed there, and it'll become
clear when the all_* variables have to be substituted instead.

Signed-off-by: Adeodato Simo <dato@google.com>
Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c711d09e

Merge remote branch 'origin/devel-2.4' · 6aac5aef

Iustin Pop authored 13 years ago


* origin/devel-2.4:
  Fix errors in hooks documentation
  Clarify a bit the noded man page
  Note --no-remember in NEWS
  Switch QA over to using instance stop --no-remember
  Implement no_remember at RAPI level
  Implement no_remember at CLI level
  Introduce instance start/stop no_remember attribute
  Bump version for the 2.4.2 release
  Fix a bug in LUInstanceMove
  Abstract ignore_consistency opcode parameter
  Preload the string-escape code in noded
  Fix error in iallocator documentation reg. disk mode
  Try to prevent instance memory changes N+1 failures
  Update NEWS file for the 2.4.2 release

Conflicts:
        NEWS                (trivial)
        doc/iallocator.rst  (kept our version)
        lib/cli.py          (trivial)
        lib/opcodes.py      (removed duplicated work, both branches
                             introduced the same new variable
                              PIgnoreConsistency :)
        lib/rapi/client.py  (trivial)
        lib/rapi/rlib2.py   (almost trivial)
        qa/ganeti-qa.py     (below trivial)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6aac5aef

May 19, 2011

cli: Replace hardcoded disk templates with constants · 235407ba
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
235407ba

mcpu: Add missing docstring to _ProcessResult · eb279644

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

eb279644

config: Add function to get instances in node group · c71b049c

Michael Hanselmann authored 13 years ago


This will be used for evacuating instances in a node group.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c71b049c

iallocator: Stricter check for multi-evac result · a01225a6

Michael Hanselmann authored 13 years ago


Check new secondary nodes' group like it's already done for
multi-relocation requests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a01225a6

cmdlib: Use ganeti.ht for checking iallocator result · 3d45d304
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
3d45d304

May 17, 2011

Fix errors in hooks documentation · 8ac5c5d7

Michael Hanselmann authored 13 years ago


In many cases the opcode ID was incorrect. A unittest for this will
be added in the master branch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8ac5c5d7

ht: Add strict check for dictionaries · a464ce71

Michael Hanselmann authored 13 years ago


This allows checking specific dictionary items, unlike TDict
or TDictOf.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a464ce71

cmdlib: Remove punctuation from error messages · 4f898534

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4f898534

Various grammar fixes and updates · dfba45b1

Stephen Shirley authored 13 years ago


Signed-off-by: Stephen Shirley <diamond@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dfba45b1

May 16, 2011

gnt-debug: New iallocator mode · 42c161cf

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

42c161cf

Add new iallocator mode to LUTestAllocator · bee581e2

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bee581e2

cmdlib.IAllocator: Add multi-relocate support · 55011921

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

55011921

Add constants for multi-relocation iallocator mode · 23cfbaab
Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
23cfbaab

Clarify a bit the noded man page · b578501b

Iustin Pop authored 13 years ago


"This can be overriden" can be read as either the port we listen on or
the address we bind to. Replace with "The port" for great clarity!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b578501b

Note --no-remember in NEWS · 6e1156ff

Iustin Pop authored 13 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

6e1156ff

Switch QA over to using instance stop --no-remember · b998270c

Iustin Pop authored 13 years ago


Instead of hardcoded Xen commands. This will make it work for all
hypervisors, instead of duplicating hypervisor functionality in QA
itself.

The timeout has been removed as gnt-instance stop itself will make
sure the instance is down before returning. We just double-check that
it is indeed down.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b998270c

Implement no_remember at RAPI level · 2ba39b8f

Iustin Pop authored 13 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

2ba39b8f

Implement no_remember at CLI level · 885a0fc4

Iustin Pop authored 13 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

885a0fc4

Introduce instance start/stop no_remember attribute · 9b64e486

Iustin Pop authored 13 years ago


This will allow stopping or starting an instance without changing the
remembered state. While this seems counter-intuitive at first (it will
create cluster verify errors), it can help in a few corner cases:

- shutting down an entire cluster for maintenance but without having
  to remember state
- doing testing of Ganeti itself

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9b64e486

May 13, 2011

cmdlib.IAllocator: Fewer temporary variables · 73cdf9a3

Michael Hanselmann authored 13 years ago


Reduce the number of temporary variables and generate dictionaries in
one go.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

73cdf9a3

TLMigrateInstance: do not migrate to self · dcfb969a

Apollon Oikonomopoulos authored 13 years ago

Check that the instance is not being migrated to its current primary node
during CheckPrereq. Otherwise migration is aborted because the instance is
already running and cleaned-up, which causes the running instance to be killed.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dcfb969a

SharedLock: Implement downgrade from exclusive to shared mode · 3dbe3ddf

Michael Hanselmann authored 13 years ago


If a job needs to modify a resource and then wait for a result, it must
acquire the resource lock in exclusive mode. In some cases it would be
possible to only have a shared lock for waiting. Until now it was not
possible to change a lock's mode once it'd been acquired. Releasing and
re-acquiring might have been possible, but would require many more
checks and can introduce new issues.

With this patch a new method, named “downgrade”, is added to Ganeti's
own SharedLock class. It can only be called when the lock is held in
exclusive mode and changes it to shared. If there are any pending shared
acquires on the same priority, they're moved to the front of the queue
and notified (jumping ahead of exclusive acquires).

In a lockset the internal lock will be downgraded if, and only if, all
individual locks owned by the current thread are either released or
acquired in shared mode.

Unittests are provided.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

3dbe3ddf

May 12, 2011
- gnt-debug: Use constants for iallocator direction · 9133387e
  Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  9133387e
- Use disk mode constants in iallocator documentation · 63fb7526
  Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  63fb7526
- gnt-debug, opcodes: Use constants for iallocator · fdbe29ee
  Michael Hanselmann authored 13 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  fdbe29ee