- Nov 01, 2012
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When a job is still waiting for locks and the queue is shutting down, they should be returned and not actually start processing. Until now jobs which transitioned from “queued” to “waiting” were already considered to be running as far as the shutdown code was concerned. This fixes issue 296. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Currently, we have instance rename doing extra checks on the host name, to prevent accidental wrong renames; however, instance create doesn't do these checks (issue 291), which (if DNS is misconfigured) can lead to hard to diagnose errors. This patch abstracts the name checking from LUInstanceRename into a separate function, which is then reused in both instance rename and instance create. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Iustin Pop authored
This addresses issue 290: when receiving new jobs, logging is incomplete, and we don't have the job ID(s) and/or summaries logged. Only later, when the job is queried for or being processed, we know more. This is not good when troubleshooting, so let's improve the initial logging. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Iustin Pop authored
There are two issues with lock exceptions right now: - first, we don't log the original error; this is fine for now (locking.py always returns the same error here), but in general is brittle: if locking.py would start returning more information, we'd completely miss that - second, an actual honest lock conflict is not an internal error; it's simply an optimistic lock failing, and as such we should not return internal error, but rather resource_not_unique This addresses issue 287. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Oct 30, 2012
-
-
Iustin Pop authored
This is the bit of documentation missing for issue 170. Doing development on a machine which already has Ganeti installed kind of works, but only when the installed and the developed version are very similar, and even then it can be problematic. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit 2c0af7da which added the runtime memory changes functionality had a small typo (wrong name); I've rewritten this to only compute the delta once, for simplicity. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This variable can be empty, when we want to disable LVM, so we can't use TMaybeString. Fixes issue 285. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch makes _RemoveDisks symmetric to _CreateDisks with respect to file-based storage: _CreateDisks uses "in constants.DTS_FILEBASED", whereas _RemoveDisks was not update and only uses "== constants.DT_FILE". This results in stale directories left on the filesystem. Fixes issue 262. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently, the warning/notice about non-redundant instances in cluster verify is based non empty secondaries list (how old is this?); the proper way to check this nowadays is via DTS_MIRRORED. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 29, 2012
-
-
Iustin Pop authored
Sorry, this should have gone in the previous commit, I forgot about it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Iustin Pop authored
This fixes issue 282. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Oct 26, 2012
-
-
Iustin Pop authored
Currently the message does not say explicitly that instance-initiated reboots are useless to trigger the use of new parameters, per the thread on the user mailing list. Let's improve it a bit. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michele Tartara <mtartara@google.com>
-
- Oct 25, 2012
-
-
Simon Deziel authored
Setting a high prefix discourages the bridge from adopting the tap's MAC. Xen is not affected by this since the MAC is forced to "fe:ff:ff:ff:ff:ff". This addresses issue #217. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit be3f52f3) (this is critical enough that we want it on stable-2.6) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 19, 2012
-
-
Iustin Pop authored
In Ganeti 2.6, disk adoption is broken due to the ipolicy checks being done before we read volume size from remote nodes. We fix this by simply moving these checks to after the disk adoption code which updates the disk size; it's not that nice that we fail a (almost) config-level check after we've reserved the LVs, etc., but we need to do so in order to validate the ipolicy correctly. Tested: - normal instance creation - creation via adoption with good size (pass) - creation via adoption with wrong LV size (fail as expected) - QA in progress Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 17, 2012
-
-
Bernardo Dal Seno authored
Better formatting of text, past tense used when appropriate. Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 12, 2012
-
-
Bernardo Dal Seno authored
This is a small bug-fix only release. Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Oct 11, 2012
-
-
Michael Hanselmann authored
Example output: $ gnt-cluster verify-disks Submitted jobs 4327 Waiting for job 4327 ... No disks need to be activated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 10, 2012
-
-
Michael Hanselmann authored
Each version should have its dedicated list. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 05, 2012
-
-
Bernardo Dal Seno authored
The man page and the bultin-in help for gnt-instance replace-disks were inconsistent. Also fixed some typos in man pages. Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
First: This enables the use of “gnt-job watch $id” for archived jobs. Now, the reason for actually making this work is that during sufficiently large group or node evacuations jobs are archived before the client gets to poll for their output. This led to situations where the jobs would finish successfully, but the client reported an error because it couldn't see the job anymore. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com> (cherry picked from commit 04569469)
-
- Oct 03, 2012
-
-
Michael Hanselmann authored
People unfamiliar with Ganeti's internals might be confused with the different hostnames showing up later in the process. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Michael Hanselmann authored
If one or more jobs failed the exit status should be set accordingly. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Michael Hanselmann authored
This fixes issue 190. The problem was that the check for DRBD was enabled if LVM storage is used and didn't depend at all on whether DRBD is enabled. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 3d8ae327)
-
- Oct 01, 2012
-
-
Gintautas Miliauskas authored
Signed-off-by:
Gintautas Miliauskas <gintas@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 27, 2012
-
-
Bernardo Dal Seno authored
If an administrator sets always_failover, it means that there is no need for another explicit approval to failover instead of migrating. Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit b5f0b5cc) Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
In older versions of GNU Bash extended patterns, such as “@(…)”, are only available with the “extglob” shell option. As pointed out in [1] and [2], “extglob” must be enabled while parsing the code. Therefore the flag must be enabled at the beginning of the script and be reset to its original value at the end as to not interfere with other code on shell initialization. [1] http://unix.stackexchange.com/questions/45957 [2] http://mywiki.wooledge.org/glob Reported by Sascha Lucas. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit 893ad76d)
-
- Sep 12, 2012
-
-
Michael Hanselmann authored
Commit 5fce6a89 changed RpcRunner._InstDict to add the disk parameters on all encoded instances. It didn't remove a special case in “_InstDictOspDp”. Update and fix unittests as well. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Constantinos Venetsanopoulos authored
We call _GatherAndLinkBlockDevs during the process, which in turn calls _RecursiveFindBD. This needs disk parameters to work. See also commit b8291e00. This was reported by Ansgar and Damien. Signed-off-by:
Constantinos Venetsanopoulos <cven@grnet.gr> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
When locks are copied “locking.ALL_SET” must be handled separately (ALL_SET has the value None). Reported by Constantinos Venetsanopoulos who saw failover for RDB-based instances not working. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Sep 07, 2012
-
-
Iustin Pop authored
The function `allocateOnSingle' has a bug in the calculation of the cluster score used for deciding which of the many target nodes to use in placing the instance: it uses the original node list for the score calculation. Due to this, since the original node list is the same for all target nodes, it means that basically `allocateOnSingle' returns the same score, no matter the target node, and hence the choosing of the node is arbitrary, instead of being done on the basis of the algorithm. This has gone uncaught until reported because the unittests only test 1 allocation at a time on an empty cluster, and do not check the consistency of the score. I'll send separate patches on the master branch for adding more checks to prevent this in the future. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Agata Murawska <agatamurawska@google.com>
-
- Sep 04, 2012
-
-
René Nussbaumer authored
There was an issue with the recent ipolicy introduction which lead to a bug in gnt-debug iallocator. It was not providing the spindle_use field and therefore it wont let you create a valid iallocator request. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Sep 03, 2012
-
-
Iustin Pop authored
To help developing Ganeti on newer distributions, let's try to fix pylint warnings/errors. I'm using pylint from current Debian wheezy: pylint 0.25.1, astng 0.23.1, common 0.58.0, and we have 3 things that needs fixing. First, a really wide "except", with the silencing in the wrong place. I'm not sure why this doesn't have "except Exception", so let's add it. However, pylint still complains about "Catching too general exception", even though we do want to catch both system and our exception, so let's add a silence for W0703. It's true that we shouldn't catch KeyboardInterrupt and friends, but that should be cleaned up on the master branch. Second, pylint complains about "redefining name builtin tuple", because we do some pattern matching in the except blocks in netutils. This seems to be a false positive, but let's clean the code around this. And finally, type inference again goes bad, so let's silence E1103 with its "boolean doesn't have 'get' method". After this, I can run "make lint", and by extension "make commit-check" on Debian Wheezy, yay! We might be able to bump our required pylint versions to something not ancient… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Pylint version: pylint 0.25.1, astng 0.23.1, common 0.58.0 crashes when passing the fully-qualified decorator name with: File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 161, in visit_function if not redefined_by_decorator(node): File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 116, in redefined_by_decorator decorator.expr.name == node.name): AttributeError: 'Getattr' object has no attribute 'name' I found out that simply using a shortened name will 'fix' this issue, so let's do this to allow running newer pylint versions. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 27, 2012
-
-
Iustin Pop authored
This is done so that all current branches can run with newer pep8; note that instead of fixing the problems (like I did on master), I've just silenced more. These should *not* be merged onto master! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 22, 2012
-
-
Constantinos Venetsanopoulos authored
Currently, hail fails with FailDisk when trying to add an instance of type: 'file', 'sharedfile' and 'rbd'. This is due to a "0" or None value in the corresponding dict inside _ComputeDiskSize, which results in a "O" or non Int value of the exported 'disk_space_total' parameter. This in turn makes hail fail, when trying to process the value: - with "Unable to read Int" if value is None (file) - with FailDisk if value is 0 (sharedfile, rbd) The latter happens because the 0 value doesn't match the instance's IPolicy, since it is lower than the minimum disk size. The second problem still exists when using adoption with 'plain' and 'blockdev' template and will be addressed in another commit. Signed-off-by:
Constantinos Venetsanopoulos <cven@grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 15, 2012
-
-
Iustin Pop authored
Due to an oversight, the pause/resume sync RPC calls in _WipeDisks lack the verification of the overall RPC status, and directly iterate over the payload. The code actually doing the wipe does verify correctly the results. This can result in jobs failing with a hard to diagnose: OpExecError ['NoneType' object is not iterable] instead of proper "RPC failed" message. This patch adds a hard check on the pause call, but for the resume call it just logs a warning if the RPC failed; the rationale being that if we can't contact the node for pausing the sync, it's likely wiping will fail too, but after the wipe has been done, we can continue. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Aug 10, 2012
-
-
Iustin Pop authored
This is a bit of a shell munging trickery, but works for now. Making it more generic can be done later. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This breaks the command, as optparse considers that an error. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-