- May 20, 2011
-
-
Guido Trotter authored
Also update NEWS on this change. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Adeodato Simo authored
This will trigger a ClusterVerifyGroup operation only on the specified group, skipping other groups as well as cluster-wide verifications. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
Previously, all nodes and instances would *always* be visited/verified. By driving the verification by node group now, we will miss nodes and instances that can't be reached from existing node groups, should that rare and bogus circumstance ever occur. We safeguard against that by checking for unreachable nodes and instances explicitly. (These will not be further verified.) Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
When sharding by group, if a mirrored instance is split (primary and secondary) between two groups, its volumes will not be properly checked: the group of the primary will warn about a missing volume in the secondary, and the group of the secondary about an unknown volume (in the secondary as well). To solve the "missing volumes" bit, we will detect this case and perform an extra RPC verify call to these split secondaries (querying only for NV_LVLIST), and introduce the results in the node images appropriately. We do this detection early in ExpandNames/CheckPrereq, as to properly lock the extra nodes. As for the "unknown volumes" warning in the secondary, we update the volume mapping with split instances before checking for orphaned volumes. Finally, we mark nodes as "ghost" only if they really don't exist in the cluster configuration, which avoid spurious "instance lives in ghost node" warnings. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
To cope with increasing cluster sizes, we now make nodes try to contact all other nodes in their group, and one node from every other group. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
The list of all hypervisor parameters has to be computed in LUClusterVerifyGroup, since it needs to be passed to nodes as NV_HVPARAMS. However, it is better only to verify said parameters once, out of LUClusterVerifyConfig. For this, we refactor the code that constructs the list of parameters to a module-level _GetAllHypervisorParameters() function that both LUs can use. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
With this change, LUClusterVerifyConfig becomes a "light" LU that only verifies the global config and other, master-only settings, and the bulk of node/instance verification is done by LUClusterVerifyGroup, which only acts on nodes and instances of a given group. To ensure that `gnt-cluster verify` continues to operate on the whole cluster, the client creates an OpClusterVerifyGroup job per node group; for convenience, the list of node groups is returned by LUClusterVerifyConfig. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
We move all error code definitions, plus the _Error and _ErrorIf helpers, to a private _VerifyErrors mix-in class that can be later shared by the new two cluster verify LUs. (_Error and _ErrorIf code was moved around verbatim, except to disable "_VerifyError class does not have 'op' or '_feedback_fn' members" errors from pylint.) Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
Previously, the "instance should not be running in this node" error was computed by verifying, for each instance, whether any node other than its primary was running it. But this is not a well-suited approach if we were to shard cluster verification (because, for each instance, we won't have information whether it's running *outside* the current set of nodes). By reversing the logic of the check, and asking instead, for each node, "is it running any instance for which it's not primary", we catch all occurrences of the problem even if running sharded. Because of this, we can also detect orphan instances at the same time (instances that are not known in the cluster config). We warn about them here too, and drop the later _VerifyOrphanInstances check. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
If we're not verifying all nodes, adding a node outside the current group for file checksums helps us making sure checksums are the same in all of the cluster. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
This commit prepares the call to _VerifyFiles for the case when the master node is not one of the nodes that's being verified (which will be the case for all node groups but one). We fix it by always passing master info and checksums to _VerifyFiles, which ensures there's a cluster-wide consistency check. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
This commit fixes a few initial simple cases in which it was assumed that we're always working over the whole cluster. With this change, we differentiate between "nodes/instances to verify" and "checks that need cluster-wide information". In particular: - retrieve hypervisor parameters always from all instances - always specify full node list in NV_NODELIST - retrieve OOB path from all nodes - verify DRBD devices against the full set of instances (this ensures minors get properly verified even if an instance is split between groups) - look up node groups against the set of all nodes (to avoid tracebacks in case instances are split between groups) - determine whether running instances are unknown by checking against the full list of instances Behavior in all cases stays the same if still running over the whole cluster. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
This commit introduces no behavior changes, and is only a minor refactoring that aids with a cleaner division of future LUClusterVerify work. The change consists in: - substitute the {node,instance}{list,info} structures previously created in Exec() by member variables created in CheckPrereq; and - mechanically convert all references to the old variables to the new member variables. Creating both self.all_{node,inst}_info and self.my_{node,inst}_info, both with the same contents at the moment, is not capricious. We've now made Exec use the my_* variables pervasively; in future commits, we'll break the assumption that all nodes and instances are listed there, and it'll become clear when the all_* variables have to be substituted instead. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
* origin/devel-2.4: Fix errors in hooks documentation Clarify a bit the noded man page Note --no-remember in NEWS Switch QA over to using instance stop --no-remember Implement no_remember at RAPI level Implement no_remember at CLI level Introduce instance start/stop no_remember attribute Bump version for the 2.4.2 release Fix a bug in LUInstanceMove Abstract ignore_consistency opcode parameter Preload the string-escape code in noded Fix error in iallocator documentation reg. disk mode Try to prevent instance memory changes N+1 failures Update NEWS file for the 2.4.2 release Conflicts: NEWS (trivial) doc/iallocator.rst (kept our version) lib/cli.py (trivial) lib/opcodes.py (removed duplicated work, both branches introduced the same new variable PIgnoreConsistency :) lib/rapi/client.py (trivial) lib/rapi/rlib2.py (almost trivial) qa/ganeti-qa.py (below trivial) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 19, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This will be used for evacuating instances in a node group. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Check new secondary nodes' group like it's already done for multi-relocation requests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 17, 2011
-
-
Michael Hanselmann authored
In many cases the opcode ID was incorrect. A unittest for this will be added in the master branch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This allows checking specific dictionary items, unlike TDict or TDictOf. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Stephen Shirley authored
Signed-off-by:
Stephen Shirley <diamond@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 16, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
"This can be overriden" can be read as either the port we listen on or the address we bind to. Replace with "The port" for great clarity! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Instead of hardcoded Xen commands. This will make it work for all hypervisors, instead of duplicating hypervisor functionality in QA itself. The timeout has been removed as gnt-instance stop itself will make sure the instance is down before returning. We just double-check that it is indeed down. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This will allow stopping or starting an instance without changing the remembered state. While this seems counter-intuitive at first (it will create cluster verify errors), it can help in a few corner cases: - shutting down an entire cluster for maintenance but without having to remember state - doing testing of Ganeti itself Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- May 13, 2011
-
-
Michael Hanselmann authored
Reduce the number of temporary variables and generate dictionaries in one go. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Apollon Oikonomopoulos authored
Check that the instance is not being migrated to its current primary node during CheckPrereq. Otherwise migration is aborted because the instance is already running and cleaned-up, which causes the running instance to be killed. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If a job needs to modify a resource and then wait for a result, it must acquire the resource lock in exclusive mode. In some cases it would be possible to only have a shared lock for waiting. Until now it was not possible to change a lock's mode once it'd been acquired. Releasing and re-acquiring might have been possible, but would require many more checks and can introduce new issues. With this patch a new method, named “downgrade”, is added to Ganeti's own SharedLock class. It can only be called when the lock is held in exclusive mode and changes it to shared. If there are any pending shared acquires on the same priority, they're moved to the front of the queue and notified (jumping ahead of exclusive acquires). In a lockset the internal lock will be downgraded if, and only if, all individual locks owned by the current thread are either released or acquired in shared mode. Unittests are provided. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 12, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-