• Iustin Pop's avatar
    Optimise multi-job submit · 009e73d0
    Iustin Pop authored
    
    
    Currently, on multi-job submits we simply iterate over the
    single-job-submit function. This means we grab a new serial, write and
    replicate (and wait for the remote nodes to ack) the serial file, and
    only then create the job file; this is repeated N times, once for each
    job.
    
    Since job identifiers are ‘cheap’, it's simpler to simply grab at the
    start a block of new IDs, write and replicate the serial count file a
    single time, and then proceed with the jobs as before. This is a cheap
    change that reduces I/O and reduces slightly the CPU consumption of the
    master daemon: submit time seems to be cut in half for big batches of
    jobs and the masterd cpu time by (I can't get consistent numbers)
    between 15%-50%.
    
    Note that this doesn't change anything for single-job submits and most
    probably for < 5 job submits either.
    Signed-off-by: default avatarIustin Pop <iustin@google.com>
    Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
    009e73d0
jqueue.py 40.4 KB