1. 12 Oct, 2011 4 commits
  2. 11 Oct, 2011 3 commits
  3. 07 Oct, 2011 4 commits
  4. 06 Oct, 2011 1 commit
  5. 05 Oct, 2011 8 commits
  6. 04 Oct, 2011 6 commits
  7. 03 Oct, 2011 10 commits
  8. 30 Sep, 2011 4 commits
    • Michael Hanselmann's avatar
      LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383
      Michael Hanselmann authored
      
      
      When verifying a group the code would always check SSH to all nodes in
      the same group, as well as the first node for every other group. On big
      clusters this can cause issues since many nodes will try to connect to
      the first node of another group at the same time. This patch changes the
      algorithm to choose a different node every time.
      
      A unittest for the selection algorithm is included.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      64c7b383
    • Iustin Pop's avatar
      Optimise cli.JobExecutor with many pending jobs · 11705e3d
      Iustin Pop authored
      
      
      In the case we submit many pending jobs (> 100) to the masterd, the
      JobExecutor 'spams' the master daemon with status requests for the
      status of all the jobs, even though in the end it will only choose a
      single job for polling.
      
      This is very sub-optimal, because when the master is busy processing
      small/fast jobs, this query forces reading all the jobs from
      this. Restricting the 'window' of jobs that we query from the entire
      set to a smaller subset makes a huge difference (masterd only, 0s
      delay jobs, all jobs to tmpfs thus no I/O involved):
      
      - submitting/waiting for 500 jobs:
        - before: ~21 s
        - after:   ~5 s
      - submitting/waiting for 1K jobs:
        - before: ~76 s
        - after:   ~8 s
      
      This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
      8s to 12s. I think that choosing the 'best' job for nice output only
      matters with a small number of jobs, and that for more than that
      people will not actually watch the jobs. So changing from 'perfect
      job' to 'best job in the first 25' should be OK.
      
      Note that most jobs won't execute as fast as 0 delay, but this is
      still a good improvement.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      11705e3d
    • Andrea Spadaccini's avatar
      Merge branch 'devel-2.5' · 3398bff1
      Andrea Spadaccini authored
      
      
      * devel-2.5:
        Use --yes to deactivate master ip in cluster merge
        Use deactivate-master-ip in cluster-merge
        Add gnt-cluster commands to toggle the master IP
        Split starting and stopping master IP and daemons
        listrunner: Don't pass arguments if there are none
        ssh: Quote strings in error message
        utils.log: Write error messages to stderr
        Add signal handling doc to hbal man page
        Migration: warn the user about hv version mismatch
        Fix handling of cluster verify hooks
        Redistribute the RAPI certificate
        QA: Add tests for instance start/stop via RAPI
        RAPI: Fix wrong check on instance shutdown
        baserlib: Accept empty body in FillOpcode
      
      Conflicts:
      	lib/backend.py
         - no real conflicts
      	lib/constants.py
         - preserve both changes
      	lib/rapi/rlib2.py
         - keep master
      	lib/rpc.py
         - no real conflicts
      	tools/cluster-merge
         - keep devel-2.5
      Signed-off-by: default avatarAndrea Spadaccini <spadaccio@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      3398bff1
    • Andrea Spadaccini's avatar
      Merge branch 'stable-2.5' into devel-2.5 · cea3abbd
      Andrea Spadaccini authored
      
      
      * stable-2.5:
        listrunner: Don't pass arguments if there are none
        ssh: Quote strings in error message
        utils.log: Write error messages to stderr
        Add signal handling doc to hbal man page
        Fix handling of cluster verify hooks
        Redistribute the RAPI certificate
        QA: Add tests for instance start/stop via RAPI
        RAPI: Fix wrong check on instance shutdown
        baserlib: Accept empty body in FillOpcode
      Signed-off-by: default avatarAndrea Spadaccini <spadaccio@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      cea3abbd