- Jul 24, 2009
-
-
Guido Trotter authored
This constant is unused, except in qa. Removing it since it's always True. This patch also removes the unused qa_rapi.PrintRemoteAPIWarning function, and removes a comment about temporary constants "until we have cluster parameters". Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 23, 2009
-
-
Guido Trotter authored
Various modules set it to True when called in debugging mode, but the utils module supports no such global. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
substitute exit(1) with exit(constants.EXIT_FAILURE). Also fix a wrongly indented line. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 22, 2009
-
-
Guido Trotter authored
On machines without the ssl file noded exists '5'. Changing this to constants.EXIT_NOTCLUSTER. Also utils.GetNodeDaemonPort hasn't risen errors.ConfigurationError for a while, so removing that try/except block. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
This hook can be used to update /etc/ethers with instance's mac addresses. A dhcp server on the nodes can then serve to the instances their correct address. (This has been tested with dnsmasq's dhcp implementation) Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 21, 2009
-
-
Iustin Pop authored
Many burnin steps initialize the batch queue at the beginning and commit it at the end of their operation. This patch moves this code to a decorator, in order to reduce redundant code. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Many burn steps to a manual check of instance aliveness, via duplicate code. This patch moves this code to a decorator. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Some burnin steps are idempotent: e.g. reinstalling an instance (from burning p.o.v.) can be done multiple times without any side-effects that would affect later burnin steps. As such, failing the whole burnin process due a reinstall failure is undesirable. This patch modifies burnin by marking each opcode (in case of individual execution) and job set retryable or not. Retryable actions will be retried up to a number of times, after which we give up and return failure. One side-effect is that in case of full-failure in retryable job sets we lose the original exception (but we do log its string format), so we have a little bit less information in this case. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 20, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 19, 2009
-
-
Iustin Pop authored
A long-standing bug in burnin makes errors during the removal phase (e.g. because an import has failed, or because the initial creation has failed) hide the original error. This patch suppresses removal errors if we are already in ‘has_err’ mode, and otherwise it displays them normally. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The list of upload files is built currently at every UploadFile() call. This patch moves it to a separate variable which is initialized only once. This won't make much difference but I regard it as cleanup. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Commit 55efe6da "Convert instance reinstall to multi instance model" actually broke instance reinstall for single-instance cases. This one-liner fixes it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit b6e243ab)
-
Iustin Pop authored
It seems epydoc needs fully-qualified references, and doesn't deal with relative ones (not even in the current module) if there are any ambiguities. There are other epydoc warnings, in the rapi docstrings, but those are left as-is as they're removed in 2.1. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently, unclean master daemon shutdown overwrites all of a job's opcode status and result with error/None. This is incorrect, since the any already finished opcode(s) should have their status and result preserved, and only not-yet-processed opcodes should be marked as ‘error’. Cancelling jobs between opcodes does the same (but this is not allowed currently by the code, so it's not as important as unclean shutdown). This patch adds a new _QueuedJob function that only overwrites the status and result of finalized opcodes, which is then used in job queue init and in the cancel job functions. The patch also adds some comments and a new set constants in constants.py highlighting the finalized vs. non-finalized opcode statuses. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor uses the optimized SubmitManyJobs luxi call and as such should be used whenever multiple jobs need to be submitted. This patch converts gnt-debug submit-job to use it and also removes an extra empty line in the JobExecutor class. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch converts ‘gnt-instance reinstall’ from single-instance to multi-instance model; since this is dangerours, it's required to pass “--force --force-multiple” to skip the confirmation. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit 55efe6da)
-
Iustin Pop authored
This small patch changed the batch create functionality to use the job executor instead of single-job submits. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit d4dd4b74)
-
Iustin Pop authored
This patch changes the generic "multiple job executor" to use the many jobs submit model, which automatically makes all its users use the new model. This makes, for example, startup/shutdown of a full cluster much more logical (all the submitted job IDs are visible fast, and then waiting for them proceeds normally). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit 23b4b983)
-
Iustin Pop authored
As a workaround for the job submit timeouts that we have, this patch adds a new luxi call for multi-job submit; the advantage is that all the jobs are added in the queue and only after the workers can start processing them. This is definitely faster than per-job submit, where the submission of new jobs competes with the workers processing jobs. On a pure no-op OpDelay opcode (not on master, not on nodes), we have: - 100 jobs: - individual: submit time ~21s, processing time ~21s - multiple: submit time 7-9s, processing time ~22s - 250 jobs: - individual: submit time ~56s, processing time ~57s run 2: ~54s ~55s - multiple: submit time ~20s, processing time ~51s run 2: ~17s ~52s which shows that we indeed gain on the client side, and maybe even on the total processing time for a high number of jobs. For just 10 or so I expect the difference to be just noise. This will probably require increasing the timeout a little when submitting too many jobs - 250 jobs at ~20 seconds is close to the current rw timeout of 60s. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit 2971c913)
-
Iustin Pop authored
If a job with more than one opcodes is being processed, and the master daemon crashes between two opcodes, we have the first N opcodes marked successful, and the rest marked as queued. This means that the overall jbo status is queued, and thus on master daemon restart it will be resent for completion. However, the RunTask() function in jqueue.py doesn't deal with partially-completed jobs. This patch makes it simply skip such opcodes. An alternative option would be to not mark partially-completed jobs as QUEUED but instead RUNNING, which would result in aborting of the job at restart time. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In case the job fails, we try to set the job's run_op_idx to -1. However, this is a wrong variable, which wasn't detected until the __slots__ addition. The correct variable is run_op_index. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 17, 2009
-
-
Iustin Pop authored
Adding slots to _QueuedOpCode decreases memory usage (of these objects) by roughly four times. It is a lesser change for _QueuedJobs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This patch converts the opcode loading to a pre-built map (at import time) instead of iteration over the globals dict at each call. Microbenchmarks show that this should be around three times faster, and burnin still passes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Guido Trotter authored
* master: Update NEWS and version for 2.0.2 release Improve the description of node flags in man page Change default stripe count to 1 Use full-stripe size in LVM growth RAPI: implement instance reinstall
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 16, 2009
-
-
Raiford Storey authored
[iustin@google.com: slightly reworded the explanation for offline and changed the commit message] Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This parameter is now mandatory for the cluster config to work. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
- Check that the enabled hypervisors list is valid - Check that the master node is a valid node Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
In order not to change the default during a stable series, we modify configure.ac to default to one stripe, in effect keeping the status quo (well, minus the LVM Attach() changes). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
LVM has issues when growing stripped volumes, so it's best to specify the growth in exact multiples of the full stripe size (as precise as possible). For this we need to do a couple of changes: - in LVM Attach(), we query additionally the VG extent size and the LV stripe count; since this makes lvs return a (possibly) multi-line output, we now split it into lines and only take the last one - in LVM Grow(), we round up the increase in multiples of the full stripe size The patch also sets the correct target size in DRBD growth. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Jul 14, 2009
-
-
Guido Trotter authored
It's been replaced by a simpler bootstrap.InitConfig function, which does the same job, and is currently unused. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This function is not used. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Currently if a disk is added later the base_index is not considered, and all the disks are called disk0. This patch fixes it. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
SimpleStore is a lot less heavyweight than SimpleConfigReader, and to just get the master name we can use that. This is the only usage of SimpleConfigReader currently, but we're not going to delete the class, as new usages will come in for ganeti-confd (in 2.1). Using it there, though, will make the class even more heavy to load, so it makes sense for this simple usage to be converted. Signed-off-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 13, 2009
-
-
Michael Hanselmann authored
This was broken by my pylint fixes patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-