- 20 May, 2011 7 commits
-
-
Adeodato Simo authored
With this change, LUClusterVerifyConfig becomes a "light" LU that only verifies the global config and other, master-only settings, and the bulk of node/instance verification is done by LUClusterVerifyGroup, which only acts on nodes and instances of a given group. To ensure that `gnt-cluster verify` continues to operate on the whole cluster, the client creates an OpClusterVerifyGroup job per node group; for convenience, the list of node groups is returned by LUClusterVerifyConfig. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
We move all error code definitions, plus the _Error and _ErrorIf helpers, to a private _VerifyErrors mix-in class that can be later shared by the new two cluster verify LUs. (_Error and _ErrorIf code was moved around verbatim, except to disable "_VerifyError class does not have 'op' or '_feedback_fn' members" errors from pylint.) Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
Previously, the "instance should not be running in this node" error was computed by verifying, for each instance, whether any node other than its primary was running it. But this is not a well-suited approach if we were to shard cluster verification (because, for each instance, we won't have information whether it's running *outside* the current set of nodes). By reversing the logic of the check, and asking instead, for each node, "is it running any instance for which it's not primary", we catch all occurrences of the problem even if running sharded. Because of this, we can also detect orphan instances at the same time (instances that are not known in the cluster config). We warn about them here too, and drop the later _VerifyOrphanInstances check. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
If we're not verifying all nodes, adding a node outside the current group for file checksums helps us making sure checksums are the same in all of the cluster. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
This commit prepares the call to _VerifyFiles for the case when the master node is not one of the nodes that's being verified (which will be the case for all node groups but one). We fix it by always passing master info and checksums to _VerifyFiles, which ensures there's a cluster-wide consistency check. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
This commit fixes a few initial simple cases in which it was assumed that we're always working over the whole cluster. With this change, we differentiate between "nodes/instances to verify" and "checks that need cluster-wide information". In particular: - retrieve hypervisor parameters always from all instances - always specify full node list in NV_NODELIST - retrieve OOB path from all nodes - verify DRBD devices against the full set of instances (this ensures minors get properly verified even if an instance is split between groups) - look up node groups against the set of all nodes (to avoid tracebacks in case instances are split between groups) - determine whether running instances are unknown by checking against the full list of instances Behavior in all cases stays the same if still running over the whole cluster. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
This commit introduces no behavior changes, and is only a minor refactoring that aids with a cleaner division of future LUClusterVerify work. The change consists in: - substitute the {node,instance}{list,info} structures previously created in Exec() by member variables created in CheckPrereq; and - mechanically convert all references to the old variables to the new member variables. Creating both self.all_{node,inst}_info and self.my_{node,inst}_info, both with the same contents at the moment, is not capricious. We've now made Exec use the my_* variables pervasively; in future commits, we'll break the assumption that all nodes and instances are listed there, and it'll become clear when the all_* variables have to be substituted instead. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 19 May, 2011 5 commits
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This will be used for evacuating instances in a node group. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Check new secondary nodes' group like it's already done for multi-relocation requests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 17 May, 2011 2 commits
-
-
Michael Hanselmann authored
This allows checking specific dictionary items, unlike TDict or TDictOf. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 16 May, 2011 7 commits
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This will allow stopping or starting an instance without changing the remembered state. While this seems counter-intuitive at first (it will create cluster verify errors), it can help in a few corner cases: - shutting down an entire cluster for maintenance but without having to remember state - doing testing of Ganeti itself Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 13 May, 2011 3 commits
-
-
Michael Hanselmann authored
Reduce the number of temporary variables and generate dictionaries in one go. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Apollon Oikonomopoulos authored
Check that the instance is not being migrated to its current primary node during CheckPrereq. Otherwise migration is aborted because the instance is already running and cleaned-up, which causes the running instance to be killed. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If a job needs to modify a resource and then wait for a result, it must acquire the resource lock in exclusive mode. In some cases it would be possible to only have a shared lock for waiting. Until now it was not possible to change a lock's mode once it'd been acquired. Releasing and re-acquiring might have been possible, but would require many more checks and can introduce new issues. With this patch a new method, named “downgrade”, is added to Ganeti's own SharedLock class. It can only be called when the lock is held in exclusive mode and changes it to shared. If there are any pending shared acquires on the same priority, they're moved to the front of the queue and notified (jumping ahead of exclusive acquires). In a lockset the internal lock will be downgraded if, and only if, all individual locks owned by the current thread are either released or acquired in shared mode. Unittests are provided. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 12 May, 2011 6 commits
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
The opcode parameter ignore_consistency was used in the LU, but not actually declared in the OpCode. The patch adds it in the opcode and the command line client. ObQuote — Please, please, can I have static typing? Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Two opcodes already use it and we need it for a third, time to add a constant for it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This encoding, part of the standard Python installation, is used by the pickle module (in turn used by subprocess when handling failures in program execution). Preloading it means that Python will cache it in memory so that even if the disk goes away or just the module, we're not going to fail in reporting errors. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 11 May, 2011 2 commits
-
-
Iustin Pop authored
There are multiple bugs with the code checking for N+1 failures in the instance memory changes which needs significant changes, in the meantime we can at least: - change the warning message into an error (--force will skip checks) - only make checks when we increase the memory Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
These sneaked in from 2.4 during the merge, but this attribute is actually gone in the master branch. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 10 May, 2011 8 commits
-
-
Marco Casavecchia authored
Hi all, this patch will add 3 new KVM parameters and a new option. New Parameters: - floppy_image_path = "" -> Specify the floppy image to load as floppy disk. - cdrom2_image_path = "" -> Specify a second cdrom image to load on the system (note: this in not intended to be used as a boot device. To boot the system from cdrom you must use the "cdrom_image_path" parameter as always). - cdrom_disk_type = "" -> it can be one of the kvm supported types as "ide,scsi,paravirtual,ecc". I introduced this optional parameter to make possible to specify a different virtual device for cdroms. It is useful if you want to install a windows system New option for "boot_device" parameter: - "floppy": with this value you should be able to boot a KVM instance from floppy image. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit cc130cc7 ) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
With this patch, the worker thread name is updated to include a short summary of the opcode (basically its OP_ID). The base name of job queue threads is shortened from “JobQueue” to “Jq”. Logs and the lock monitor will show a job verifying the cluster as e.g. “Jq2/Job1742/C_VERIFY”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This will hopefully detect potential LVM (or any other storage, when they implement it) issues before committing changes just on some nodes. Unfortunately due to the dry_run opcode handling, we can't integrate this into the usual handling (as we need to activate the disks before doing any tests, which belongs in Exec not in CheckPrereq). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This is always called with False from backend for now. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Also reorder the methods to match all other LUs. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also change the way “share_locks” is filled. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-