diff --git a/Makefile.am b/Makefile.am index 6f93423bd74bd376e923c821d6d65295aad25a17..1a3b41bc53e2dd547b046ac53d03766f5abe7a4a 100644 --- a/Makefile.am +++ b/Makefile.am @@ -286,6 +286,7 @@ docrst = \ doc/design-lu-generated-jobs.rst \ doc/design-multi-reloc.rst \ doc/design-network.rst \ + doc/design-chained-jobs.rst \ doc/cluster-merge.rst \ doc/design-shared-storage.rst \ doc/devnotes.rst \ diff --git a/doc/design-chained-jobs.rst b/doc/design-chained-jobs.rst new file mode 100644 index 0000000000000000000000000000000000000000..8cf870272cb3a39ac5088a4997c043a031f6ca0f --- /dev/null +++ b/doc/design-chained-jobs.rst @@ -0,0 +1,157 @@ +============ +Chained jobs +============ + +.. contents:: :depth: 4 + +This is a design document about the innards of Ganeti's job processing. +Readers are advised to study previous design documents on the topic: + +- :ref:`Original job queue <jqueue-original-design>` +- :ref:`Job priorities <jqueue-job-priority-design>` +- :doc:`LU-generated jobs <design-lu-generated-jobs>` + + +Current state and shortcomings +============================== + +Ever since the introduction of the job queue with Ganeti 2.0 there have +been situations where we wanted to run several jobs in a specific order. +Due to the job queue's current design, such a guarantee can not be +given. Jobs are run according to their priority, their ability to +acquire all necessary locks and other factors. + +One way to work around this limitation is to do some kind of job +grouping in the client code. Once all jobs of a group have finished, the +next group is submitted and waited for. There are different kinds of +clients for Ganeti, some of which don't share code (e.g. Python clients +vs. htools). This design proposes a solution which would be implemented +as part of the job queue in the master daemon. + + +Proposed changes +================ + +With the implementation of :ref:`job priorities +<jqueue-job-priority-design>` the processing code was re-architectured +and became a lot more versatile. It now returns jobs to the queue in +case the locks for an opcode can't be acquired, allowing other +jobs/opcodes to be run in the meantime. + +The proposal is to add a new, optional property to opcodes to define +dependencies on other jobs. Job X could define opcodes with a dependency +on the success of job Y and would only be run once job Y is finished. If +there's a dependency on success and job Y failed, job X would fail as +well. Since such dependencies would use job IDs, the jobs still need to +be submitted in the right order. + +.. pyassert:: + + # Update description below if finalized job status change + constants.JOBS_FINALIZED == frozenset([ + constants.JOB_STATUS_CANCELED, + constants.JOB_STATUS_SUCCESS, + constants.JOB_STATUS_ERROR, + ]) + +The new attribute's value would be a list of two-valued tuples. Each +tuple contains a job ID and a list of requested status for the job +depended upon. Only final status are accepted +(:pyeval:`utils.CommaJoin(constants.JOBS_FINALIZED)`). An empty list is +equivalent to specifying all final status (except +:pyeval:`constants.JOB_STATUS_CANCELED`, which is treated specially). +An opcode runs only once all its dependency requirements have been +fulfilled. + +Any job referring to a cancelled job is also cancelled unless it +explicitely lists :pyeval:`constants.JOB_STATUS_CANCELED` as a requested +status. + +In case a referenced job can not be found in the normal queue or the +archive, referring jobs fail as the status of the referenced job can't +be determined. + +With this change, clients can submit all wanted jobs in the right order +and proceed to wait for changes on all these jobs (see +``cli.JobExecutor``). The master daemon will take care of executing them +in the right order, while still presenting the client with a simple +interface. + +Clients using the ``SubmitManyJobs`` interface can use relative job IDs +(negative integers) to refer to jobs in the same submission. + +.. highlight:: javascript + +Example data structures:: + + # First job + { + "job_id": "6151", + "ops": [ + { "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., }, + { "OP_ID": "OP_INSTANCE_FAILOVER", ..., }, + ], + } + + # Second job, runs in parallel with first job + { + "job_id": "7687", + "ops": [ + { "OP_ID": "OP_INSTANCE_MIGRATE", ..., }, + ], + } + + # Third job, depending on success of previous jobs + { + "job_id": "9218", + "ops": [ + { "OP_ID": "OP_NODE_SET_PARAMS", + "depend": [ + [6151, ["success"]], + [7687, ["success"]], + ], + "offline": True, }, + ], + } + + +Other discussed solutions +========================= + +Job-level attribute +------------------- + +At a first look it might seem to be better to put dependencies on +previous jobs at a job level. However, it turns out that having the +option of defining only a single opcode in a job as having such a +dependency can be useful as well. The code complexity in the job queue +is equivalent if not simpler. + +Since opcodes are guaranteed to run in order, clients can just define +the dependency on the first opcode. + +Another reason for the choice of an opcode-level attribute is that the +current LUXI interface for submitting jobs is a bit restricted and would +need to be changed to allow the addition of job-level attributes, +potentially requiring changes in all LUXI clients and/or breaking +backwards compatibility. + + +Client-side logic +----------------- + +There's at least one implementation of a batched job executor twisted +into the ``burnin`` tool's code. While certainly possible, a client-side +solution should be avoided due to the different clients already in use. +For one, the :doc:`remote API <rapi>` client shouldn't import +non-standard modules. htools are written in Haskell and can't use Python +modules. A batched job executor contains quite some logic. Even if +cleanly abstracted in a (Python) library, sharing code between different +clients is difficult if not impossible. + + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: diff --git a/doc/design-draft.rst b/doc/design-draft.rst index 63759db3f7999c8c4be4202c68f07c473dac23f4..390684b58576d047c489775e7c50e2f53e262489 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -11,6 +11,7 @@ Design document drafts design-lu-generated-jobs.rst design-multi-reloc.rst design-cpu-pinning.rst + design-chained-jobs.rst .. vim: set textwidth=72 : .. Local Variables: