Commit f6424741 authored by Iustin Pop's avatar Iustin Pop
Browse files

job queue: fix interrupted job processing



If a job with more than one opcodes is being processed, and the master
daemon crashes between two opcodes, we have the first N opcodes marked
successful, and the rest marked as queued. This means that the overall
jbo status is queued, and thus on master daemon restart it will be
resent for completion.

However, the RunTask() function in jqueue.py doesn't deal with
partially-completed jobs. This patch makes it simply skip such opcodes.

An alternative option would be to not mark partially-completed jobs as
QUEUED but instead RUNNING, which would result in aborting of the job at
restart time.
Signed-off-by: default avatarIustin Pop <iustin@google.com>
Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
parent ed21712b
......@@ -362,6 +362,15 @@ class _JobQueueWorker(workerpool.BaseWorker):
count = len(job.ops)
for idx, op in enumerate(job.ops):
op_summary = op.input.Summary()
if op.status == constants.OP_STATUS_SUCCESS:
# this is a job that was partially completed before master
# daemon shutdown, so it can be expected that some opcodes
# are already completed successfully (if any did error
# out, then the whole job should have been aborted and not
# resubmitted for processing)
logging.info("Op %s/%s: opcode %s already processed, skipping",
idx + 1, count, op_summary)
continue
try:
logging.info("Op %s/%s: Starting opcode %s", idx + 1, count,
op_summary)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment