From 11705e3de9ee4ddec3a40dda7dc6ed40f769339e Mon Sep 17 00:00:00 2001
From: Iustin Pop <iustin@google.com>
Date: Fri, 30 Sep 2011 16:35:29 +0200
Subject: [PATCH] Optimise cli.JobExecutor with many pending jobs

In the case we submit many pending jobs (> 100) to the masterd, the
JobExecutor 'spams' the master daemon with status requests for the
status of all the jobs, even though in the end it will only choose a
single job for polling.

This is very sub-optimal, because when the master is busy processing
small/fast jobs, this query forces reading all the jobs from
this. Restricting the 'window' of jobs that we query from the entire
set to a smaller subset makes a huge difference (masterd only, 0s
delay jobs, all jobs to tmpfs thus no I/O involved):

- submitting/waiting for 500 jobs:
  - before: ~21 s
  - after:   ~5 s
- submitting/waiting for 1K jobs:
  - before: ~76 s
  - after:   ~8 s

This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
8s to 12s. I think that choosing the 'best' job for nice output only
matters with a small number of jobs, and that for more than that
people will not actually watch the jobs. So changing from 'perfect
job' to 'best job in the first 25' should be OK.

Note that most jobs won't execute as fast as 0 delay, but this is
still a good improvement.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
---
 lib/cli.py | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/cli.py b/lib/cli.py
index d7e497e58..38d2b6490 100644
--- a/lib/cli.py
+++ b/lib/cli.py
@@ -257,6 +257,9 @@ _PRIONAME_TO_VALUE = dict(_PRIORITY_NAMES)
  QR_UNKNOWN,
  QR_INCOMPLETE) = range(3)
 
+#: Maximum batch size for ChooseJob
+_CHOOSE_BATCH = 25
+
 
 class _Argument:
   def __init__(self, min=0, max=None): # pylint: disable=W0622
@@ -3055,7 +3058,8 @@ class JobExecutor(object):
     """
     assert self.jobs, "_ChooseJob called with empty job list"
 
-    result = self.cl.QueryJobs([i[2] for i in self.jobs], ["status"])
+    result = self.cl.QueryJobs([i[2] for i in self.jobs[:_CHOOSE_BATCH]],
+                               ["status"])
     assert result
 
     for job_data, status in zip(self.jobs, result):
-- 
GitLab