1. 12 Oct, 2011 6 commits
  2. 11 Oct, 2011 3 commits
  3. 07 Oct, 2011 4 commits
  4. 06 Oct, 2011 1 commit
  5. 05 Oct, 2011 8 commits
  6. 04 Oct, 2011 6 commits
  7. 03 Oct, 2011 10 commits
  8. 30 Sep, 2011 2 commits
    • Michael Hanselmann's avatar
      LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383
      Michael Hanselmann authored
      
      
      When verifying a group the code would always check SSH to all nodes in
      the same group, as well as the first node for every other group. On big
      clusters this can cause issues since many nodes will try to connect to
      the first node of another group at the same time. This patch changes the
      algorithm to choose a different node every time.
      
      A unittest for the selection algorithm is included.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      64c7b383
    • Iustin Pop's avatar
      Optimise cli.JobExecutor with many pending jobs · 11705e3d
      Iustin Pop authored
      
      
      In the case we submit many pending jobs (> 100) to the masterd, the
      JobExecutor 'spams' the master daemon with status requests for the
      status of all the jobs, even though in the end it will only choose a
      single job for polling.
      
      This is very sub-optimal, because when the master is busy processing
      small/fast jobs, this query forces reading all the jobs from
      this. Restricting the 'window' of jobs that we query from the entire
      set to a smaller subset makes a huge difference (masterd only, 0s
      delay jobs, all jobs to tmpfs thus no I/O involved):
      
      - submitting/waiting for 500 jobs:
        - before: ~21 s
        - after:   ~5 s
      - submitting/waiting for 1K jobs:
        - before: ~76 s
        - after:   ~8 s
      
      This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
      8s to 12s. I think that choosing the 'best' job for nice output only
      matters with a small number of jobs, and that for more than that
      people will not actually watch the jobs. So changing from 'perfect
      job' to 'best job in the first 25' should be OK.
      
      Note that most jobs won't execute as fast as 0 delay, but this is
      still a good improvement.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      11705e3d