- Jan 11, 2013
-
-
Michael Hanselmann authored
Reported in issue 341. In some setups the instance live in a different netblock from the cluster. Therefore a the configuration-global “rename” name shouldn't be used for them. Instead another instance name is used. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jan 09, 2013
-
-
Michael Hanselmann authored
This will be used in QA to format network interface parameters. This is a cherry-pick of master commit eac9b7b8 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
This was fixed in stable-2.6, commit 053c356a Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Requested in issue 337. The parameter “bridge” was not documented and is therefore silently replaced with “master-netdev”. A note is added to “qa-sample.json” describing how comments work. This is a cherry-pick of master commit 3601d488 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Commit ec0652ad (June 2009) removed the option. This is a cherry-pick of master commit 78453739 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 20, 2012
-
-
Iustin Pop authored
I'm already setting this to a release date of tomorrow, since QA on the 2.6 branch has been clean. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 19, 2012
-
-
Iustin Pop authored
Accidentally stumbled upon this while testing unrelated code on a machine with ~3K active jobs - the bash completion unittest was hanging. Upon investigation, it turns out that bash's ${var//pattern/repl/} is probably quadratic in the size of input (or worse, even): $ touch job-{1..500} $ time ( a=$(echo job-*); echo ${a//job-/}| wc -c; ) 1892 real 0m0.597s user 0m0.590s $ touch job-{1..1000} $ time ( a=$(echo job-*); echo ${a//job-/}| wc -c; ) 3893 real 0m4.654s user 0m4.580s We can easily fix this if we change to array-based substitution (once per element): $ time ( a=($(echo job-*)); echo ${a[*]/job-/} |wc -c; ) 3893 real 0m0.028s user 0m0.010s $ touch job-{1..10000} $ time ( a=($(echo job-*)); echo ${a[*]/job-/} |wc -c; ) 48894 real 0m0.233s user 0m0.220s This means that exactly when the master node is busy processing many jobs, we could accidentally start consuming lots of CPU in the bash completion, which is not good. Note: the code might have problems with filenames containing spaces (I didn't reset the IFS, etc.), but the original code had the same issue, I think. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 14, 2012
-
-
Iustin Pop authored
During a normal configuration update, done via write to temp file and rename, this is what confd logs (slightly edited for clarity): 2012-12-14 01:05:53: ganeti-confd INFO Loaded new config, serial 21866 2012-12-14 01:06:18: ganeti-confd INFO File lost, trying to re-establish notifier 2012-12-14 01:06:18: ganeti-confd INFO Loaded new config, serial 21867 2012-12-14 01:07:09: ganeti-confd INFO File lost, trying to re-establish notifier 2012-12-14 01:07:09: ganeti-confd INFO Loaded new config, serial 21868 Since this happens always, we should demote the "File lost" messages to debug level, to keep the logs more clear. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Iustin Pop authored
Note: I'll add tests for this on the master branch, but not here. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Bernardo Dal Seno authored
The only semantic change is the fix of the spelling of the option --ipolicy-disk-templates. Signed-off-by:
Bernardo Dal Seno <bdalseno@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Dec 13, 2012
-
-
Iustin Pop authored
This fixes Issue 257. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Helga Velroyen <helgav@google.com>
-
Iustin Pop authored
Commit 1ce03fb1 (“Add ht-based result checks to opcodes”) introduced infrastructure for checking opcode results, and subsequent commits improved the list of opcodes which do declare a result, however this was not tested for dry-run mode operation. Furthermore, there's no authoritative list of which opcode/LUs support dry_run mode at all; currently, this is based/restricted on the list of CLI options, so… for now we disable the result verification if the opcode has been executed in dry_run mode. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 12, 2012
-
-
Michael Hanselmann authored
This is in preparation for a 2.6.2 release. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When all OS parameters should be unset (“gnt-os modify -H -xen-pvm foo”), a TypeError was raised. This fixes issue 311. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 22, 2012
-
-
Guido Trotter authored
In cmdlib we must only use the hypervisor class, and never instantiate it. As such we have to call GetHypervisorClass instead, to avoid getting an instance of it. This fixes Issue 316, because __init__ is not called from masterd anymore, and thus can't fail on EnsureDirs. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 19, 2012
-
-
Iustin Pop authored
The 'command' attribute of the OpOobCommand command is defined with a default value of None, but its validation requires a member of constants.OOB_COMMANDS, which doesn't accept None. This result in the following error when submitting an opcode without the command: error type: wrong_input, error details: Parameter 'OP_OOB_COMMAND.command' fails validation I suspect this was simply a mistake, since the commit that introduced it (65e183af, “opcodes: Add opcode parameter definitions”) did lots of bulk updates. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 15, 2012
-
-
Michael Hanselmann authored
s/exists/exist/ Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michele Tartara <mtartara@google.com>
-
- Nov 14, 2012
-
-
Guido Trotter authored
The text of the manpage explains that an index can be prepended to "remove" but the short help doesn't mention it. Adding it helps making the syntax clear. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 12, 2012
-
-
Iustin Pop authored
Commit 6a1434d7 (“Make migration RPC non-blocking”) changed the API for reporting migration status, but has a small cosmetic bug: if the migration status if failure, but the RPC itself to get the status didn't fail, it shows the following error message: Could not migrate instance instance2: None since it always uses result.fail_msg, irrespective of which part of the if condition failed. This patch simply updates the msg if not already set, leading to: Could not migrate instance instance2: hypervisor returned failure Proper error display can be done once the migration status objects can return failure information as well, beside status. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Helga Velroyen <helgav@google.com>
-
Iustin Pop authored
Commit 6a1434d7 (“Make migration RPC non-blocking”) changed from raising HypervisorErrors to returning MigrationStatus objects. However, these objects don't have an "info" attribute, so they can't pass a reason back (which is in itself a bug); but the KVM hypervisor code attempts to do so, and fails at runtime with: Failed to get migration status: 'MigrationStatus' object has no attribute 'info' instead of the intended: Migration failed, aborting: too many broken 'info migrate' answers For now (on stable-2.6), let's just remove the "info" reason, and later we can add it back properly once we have a way to correctly represent migration status failures in the LU. This fixes issue 297. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently, the code uses createFile, which has the effect of always truncating the file. This is bad, as the content of the PID file is wiped even when we wouldn't be able to lock it! We switch to openFd (createFile is just a wrapper over that), and we use an explicit set of flags; defaultFileFlags is already safe (trunc=False), but I prefer to set it explicitly with our desired flags. Note that this bug doesn't manifest in normal usage, as daemon-util won't try to start the daemon if already running. But if anyone or anything does call ganeti-confd explicitly, the pid file will be emptied and the daemon will keep trying to be restarted forever… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 08, 2012
-
-
Michael Hanselmann authored
pylint complained, I fixed it, and unfortunately pushed too early. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Once again this will be used by forthcoming RAPI test. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
A newly added test for RAPI will also verify the returned headers. A test in ganeti.rapi.client_unittest.py is split into smaller stand-alone tests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
A newly added piece of code will also have to parse headers, so having this wrapper saves us from copying this part of code. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 07, 2012
-
-
Michael Hanselmann authored
Commit f0d22861 changed the logic of gnt_instance._ConvertNicDiskModifications to also allow a parameter named “modify”. Unfortunately the corresponding unittest was not updated. An “if”/“else” condition is also merged. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This small patch fixes compatibility with a few newer Haskell libraries: - base 4.6, included with ghc 7.6, removed the deprecated 'catch' function from Prelude, so our "import Prelude hiding (catch)" is now an error; we workaround by using fully-qualified Control.Exception.catch name - containers 0.5 changed the signature of 'deleteFindMax'; we workaround by using separate 'findMax' and 'deleteMax' - QuickCheck 2.5 removed the 'maxDiscards' test parameter, replacing it with a much better 'maxDiscardsRatio'; however, until we can depend on that, we workaround by just removing it (we don't control anymore the maxDiscards, instead leaving it default; for our default test size, this is no change, as the default value is already 500, which is our default as well) and not printing it anymore Tested on Squeeze (+extra libs), Wheezy and experimental, which covers all supported GHC versions. Also, merging this in master will be a pain, but unless we want to stop supporting 2.6… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 06, 2012
-
-
Guido Trotter authored
- Rename xm-console-wrapper to xen-console-wrapper - Pass the xen command to use as a parameter Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Until now the only way to make live migration work in conjunction with "xl" was to add ssh known_hosts keys for every node's secondary ip on every other node. With this command we remove the target key verification: this is not worse than what we were doing before with "xm", and allows the migration to happen under either toolstack, without extra manual work. Of course the full security of ssh is not used by live migration, then. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
If the toolstack is set to "xl" we shouldn't ping xend for liveness before attempting a live migration. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 01, 2012
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
When a job is still waiting for locks and the queue is shutting down, they should be returned and not actually start processing. Until now jobs which transitioned from “queued” to “waiting” were already considered to be running as far as the shutdown code was concerned. This fixes issue 296. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Currently, we have instance rename doing extra checks on the host name, to prevent accidental wrong renames; however, instance create doesn't do these checks (issue 291), which (if DNS is misconfigured) can lead to hard to diagnose errors. This patch abstracts the name checking from LUInstanceRename into a separate function, which is then reused in both instance rename and instance create. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Iustin Pop authored
This addresses issue 290: when receiving new jobs, logging is incomplete, and we don't have the job ID(s) and/or summaries logged. Only later, when the job is queried for or being processed, we know more. This is not good when troubleshooting, so let's improve the initial logging. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
Iustin Pop authored
There are two issues with lock exceptions right now: - first, we don't log the original error; this is fine for now (locking.py always returns the same error here), but in general is brittle: if locking.py would start returning more information, we'd completely miss that - second, an actual honest lock conflict is not an internal error; it's simply an optimistic lock failing, and as such we should not return internal error, but rather resource_not_unique This addresses issue 287. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Bernardo Dal Seno <bdalseno@google.com>
-
- Oct 30, 2012
-
-
Iustin Pop authored
This is the bit of documentation missing for issue 170. Doing development on a machine which already has Ganeti installed kind of works, but only when the installed and the developed version are very similar, and even then it can be problematic. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Commit 2c0af7da which added the runtime memory changes functionality had a small typo (wrong name); I've rewritten this to only compute the delta once, for simplicity. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This variable can be empty, when we want to disable LVM, so we can't use TMaybeString. Fixes issue 285. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-