- Jul 19, 2009
-
-
Iustin Pop authored
The list of upload files is built currently at every UploadFile() call. This patch moves it to a separate variable which is initialized only once. This won't make much difference but I regard it as cleanup. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
It seems epydoc needs fully-qualified references, and doesn't deal with relative ones (not even in the current module) if there are any ambiguities. There are other epydoc warnings, in the rapi docstrings, but those are left as-is as they're removed in 2.1. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently, unclean master daemon shutdown overwrites all of a job's opcode status and result with error/None. This is incorrect, since the any already finished opcode(s) should have their status and result preserved, and only not-yet-processed opcodes should be marked as ‘error’. Cancelling jobs between opcodes does the same (but this is not allowed currently by the code, so it's not as important as unclean shutdown). This patch adds a new _QueuedJob function that only overwrites the status and result of finalized opcodes, which is then used in job queue init and in the cancel job functions. The patch also adds some comments and a new set constants in constants.py highlighting the finalized vs. non-finalized opcode statuses. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor uses the optimized SubmitManyJobs luxi call and as such should be used whenever multiple jobs need to be submitted. This patch converts gnt-debug submit-job to use it and also removes an extra empty line in the JobExecutor class. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch changes the generic "multiple job executor" to use the many jobs submit model, which automatically makes all its users use the new model. This makes, for example, startup/shutdown of a full cluster much more logical (all the submitted job IDs are visible fast, and then waiting for them proceeds normally). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit 23b4b983)
-
Iustin Pop authored
As a workaround for the job submit timeouts that we have, this patch adds a new luxi call for multi-job submit; the advantage is that all the jobs are added in the queue and only after the workers can start processing them. This is definitely faster than per-job submit, where the submission of new jobs competes with the workers processing jobs. On a pure no-op OpDelay opcode (not on master, not on nodes), we have: - 100 jobs: - individual: submit time ~21s, processing time ~21s - multiple: submit time 7-9s, processing time ~22s - 250 jobs: - individual: submit time ~56s, processing time ~57s run 2: ~54s ~55s - multiple: submit time ~20s, processing time ~51s run 2: ~17s ~52s which shows that we indeed gain on the client side, and maybe even on the total processing time for a high number of jobs. For just 10 or so I expect the difference to be just noise. This will probably require increasing the timeout a little when submitting too many jobs - 250 jobs at ~20 seconds is close to the current rw timeout of 60s. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit 2971c913)
-
Iustin Pop authored
If a job with more than one opcodes is being processed, and the master daemon crashes between two opcodes, we have the first N opcodes marked successful, and the rest marked as queued. This means that the overall jbo status is queued, and thus on master daemon restart it will be resent for completion. However, the RunTask() function in jqueue.py doesn't deal with partially-completed jobs. This patch makes it simply skip such opcodes. An alternative option would be to not mark partially-completed jobs as QUEUED but instead RUNNING, which would result in aborting of the job at restart time. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
In case the job fails, we try to set the job's run_op_idx to -1. However, this is a wrong variable, which wasn't detected until the __slots__ addition. The correct variable is run_op_index. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 17, 2009
-
-
Iustin Pop authored
Adding slots to _QueuedOpCode decreases memory usage (of these objects) by roughly four times. It is a lesser change for _QueuedJobs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch converts the opcode loading to a pre-built map (at import time) instead of iteration over the globals dict at each call. Microbenchmarks show that this should be around three times faster, and burnin still passes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 16, 2009
-
-
Guido Trotter authored
- Check that the enabled hypervisors list is valid - Check that the master node is a valid node Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently we have both a default_hypervisor and an enabled_hypervisors list. The former is only settable at cluster init time, while the latter can be changed with cluster modify. This becomes cumbersome in a few ways: at cluster init time for example if we pass in a list of enabled hypervisors which doesn't include the "default" xen-pvm one, we're also forced to pass a default hypervisor, or an error will be reported. It is also currently possible to disable the default hypervisor in cluster-modify (with unknown results). In order to avoid this we get rid of this field altogether, and define the "first" enabled hypervisor as the default one. This allows ease of changing which one is the default, and at the same time maintains coherency. At configuration upgrade we make sure that the old default is first in the list, so that 2.0 cluster defaults are preserved. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
InitConfig currently creates the cluster config_data, then puts it into a dict, passes it to SimpleConfigWriter to load it from a dict (which just reuses the dict value) and then saves it. The SimpleConfigWriter is then returned, but ignored. With this patch we just write out the config_data at InitConfig time, and thus can remove SimpleConfigWriter altogether. The now unused SimpleConfigReader.FromDict is also gone. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
InitConfig returns a SimpleConfigWriter to InitCluster, which then passes it on to ssh.WriteKnownHostsFile, which extracts a couple of values from it. One line later the full ConfigWriter is initialized. By initializing it one line before we can pass the full writer to ssh.WriteKnownHostsFile, and thus we don't need to care anymore for the InitConfig returned SimpleConfigWriter Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
I got overexcited and forgot we have to remain compatible with python 2.4. With this patch we move from sha256 to sha1 for hmac authenticated serialized messages, and we handle both newer and older python, by importing the right module for each. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
LVM has issues when growing stripped volumes, so it's best to specify the growth in exact multiples of the full stripe size (as precise as possible). For this we need to do a couple of changes: - in LVM Attach(), we query additionally the VG extent size and the LV stripe count; since this makes lvs return a (possibly) multi-line output, we now split it into lines and only take the last one - in LVM Grow(), we round up the increase in multiples of the full stripe size The patch also sets the correct target size in DRBD growth. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Jul 14, 2009
-
-
Guido Trotter authored
It's been replaced by a simpler bootstrap.InitConfig function, which does the same job, and is currently unused. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This function is not used. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
Currently if a disk is added later the base_index is not considered, and all the disks are called disk0. This patch fixes it. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Guido Trotter authored
This patch includes HMAC authenticated json messages to the serializer. The new interface works on any json-encodable data type, and can sign it with a private key and an optional salt. The same private key must be used upon message loading to verify the message. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 13, 2009
-
-
Michael Hanselmann authored
This resource can be used to retrieve and set the role of a node. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This was broken by my pylint fixes patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This patch adds instance reinstall to RAPI, with two optional parameters: - ‘os', in order to change the OS on reinstall - ‘nostartup’, in order to leave the instance down after reinstall The call will first shutdown the instance, the reinstall it, and unless ‘nostartup’ has been passed and is equal to 1, it will be started automatically. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 08, 2009
-
-
Guido Trotter authored
When the parameter is set to True and start_daemons is also True, ganeti-masterd will be started with the new --no-voting --yes-do-it options. This new option is set to True only on masterfailover, when no_voting is used. This changed the behavior from 2.0, where we didn't start the master daemon at all, when this option was used. The manpage is also updated to remove the 2.0 only change. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This allows failing over in certain corner cases, such as a 2 node cluster with one node down. The man page is also updated to document this dangerous option and how to recover from this situation. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 07, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If a user used ^Z to stop the program, poll() in socket.recv would return EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 01, 2009
-
-
Iustin Pop authored
With the change to stripped LVs, the actual size of a meta device (which is small) can be more than we expected (for non-stripped LVs). This patch increases from 160MB to 1GB the accepted size, and updates the comment with the rationale behind this change. Note that we do want even meta devices stripped, since it can increase metadata update. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Jun 30, 2009
-
-
Iustin Pop authored
Currently, when draining nodes we reset their master candidate flag, but we don't instruct them to demote themselves. This leads to “ERROR: file '/var/lib/ganeti/config.data' should not exist on non master candidates (and the file is outdated)”. This patch simply adds a call to node_demote_from_mc in this case. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch fixes a few node readd issues. Currently, the node readd consists of two opcodes: - OpSetNodeParms, which resets the offline/drained flags - OpAddNode (with readd=True), which reconfigures the node The problem is that between these two, the configuration is inconsistent for certain cluster configurations. Thus, this patch removes the first opcode and modified the LUAddNode to deal with this case too. The patch also modifies the computation of the intended master_candidate status, and actually sets the readded node to master candidate if needed. Previously, we didn't modify the existing node at all. Finally, the patch modifies the bottom of the Exec() function for this LU to: - trigger a node update, which in turn redistributes the ssconf files to all nodes (and thus the new node too) - if the new node is not a master candidate, then call the node_demote_from_mc RPC so that old master files are cleared My testing shows this behaves correctly for various cases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
If the config file is missing when the DemoteFromMC() function is called, it will raise a ProgrammerError. Instead of changing the utils.CreateBackup() file which is called from multiple places, for now we only change the DemoteFromMC() function to not call it if the file is not existing (we rely on the master to prevent race conditions here). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to ignore some nodes in the calculation, so that we can use it to predict cluster state without some nodes (which we know we will modify, and thus we should not rely on their state). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Currently the message for extraneous files on non master candidates is confusing, to say the least. This makes it hopefully more clear. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-