Implement job 'waiting' status
Background: when we have multiple jobs in the queue (more than just a few), many of the jobs (up to the number of threads) will be in state 'running', although many of them could be actually blocked, waiting for some locks. This is not good, as one cannot easily see what is happening. The patch extends the opcode/job possible statuses with another one, waiting, which shows that the LU is in the acquire locks phase. The mechanism for doing so is simple, we initialize (in the job queue) the opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give control to the LU's Exec, it will call a notifier back into the _JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with the proper queue locking). Because this mechanism does not save the job, all opcodes on disk will be in status WAITLOCK and not RUNNING anymore, so we also change the load sequence to consider WAITLOCK as RUNNING. With the patch applied, creating in parallel (via burnin) five instances on a five node cluster shows that only two are executing, while three are waiting for locks. Reviewed-by: imsnah
Loading
Please register or sign in to comment