1. 27 Oct, 2011 1 commit
  2. 20 Oct, 2011 1 commit
  3. 19 Oct, 2011 2 commits
  4. 18 Oct, 2011 1 commit
  5. 17 Oct, 2011 1 commit
  6. 12 Oct, 2011 1 commit
    • Michael Hanselmann's avatar
      rpc: Disable HTTP client pool and reduce memory consumption · 05927995
      Michael Hanselmann authored
      We noticed that “ganeti-masterd” can use large amounts of memory,
      especially on large clusters. Measurements showed a single PycURL client
      using about 500 kB of heap memory (the actual usage depends on versions,
      build options and settings).
      The RPC client uses a per-thread HTTP client pool with one client per
      node. At this time there are 41 non-main threads (25 for the job queue
      and 16 for client requests). This means the HTTP client pools use a lot
      of memory (ca. 200 MB for 10 nodes, ca. 1 GB for 50 nodes).
      This patch disables the per-thread HTTP client pool. No cleanup of
      unused code is done. That will be done in the master branch only.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  7. 04 Oct, 2011 1 commit
  8. 03 Oct, 2011 2 commits
  9. 30 Sep, 2011 4 commits
    • Michael Hanselmann's avatar
      LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383
      Michael Hanselmann authored
      When verifying a group the code would always check SSH to all nodes in
      the same group, as well as the first node for every other group. On big
      clusters this can cause issues since many nodes will try to connect to
      the first node of another group at the same time. This patch changes the
      algorithm to choose a different node every time.
      A unittest for the selection algorithm is included.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
    • Iustin Pop's avatar
      Optimise cli.JobExecutor with many pending jobs · 11705e3d
      Iustin Pop authored
      In the case we submit many pending jobs (> 100) to the masterd, the
      JobExecutor 'spams' the master daemon with status requests for the
      status of all the jobs, even though in the end it will only choose a
      single job for polling.
      This is very sub-optimal, because when the master is busy processing
      small/fast jobs, this query forces reading all the jobs from
      this. Restricting the 'window' of jobs that we query from the entire
      set to a smaller subset makes a huge difference (masterd only, 0s
      delay jobs, all jobs to tmpfs thus no I/O involved):
      - submitting/waiting for 500 jobs:
        - before: ~21 s
        - after:   ~5 s
      - submitting/waiting for 1K jobs:
        - before: ~76 s
        - after:   ~8 s
      This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
      8s to 12s. I think that choosing the 'best' job for nice output only
      matters with a small number of jobs, and that for more than that
      people will not actually watch the jobs. So changing from 'perfect
      job' to 'best job in the first 25' should be OK.
      Note that most jobs won't execute as fast as 0 delay, but this is
      still a good improvement.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
    • Michael Hanselmann's avatar
    • Michael Hanselmann's avatar
      utils.log: Write error messages to stderr · 34aa8b7c
      Michael Hanselmann authored
      When “gnt-cluster copyfile” failed it would only print “Copy of file …
      to node … failed”. A detailed message is written using logging.error.
      Writing error messages to stderr can be helpful in figuring out what
      went wrong (the messages also go to the log file, but not everyone might
      know about it).
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  10. 28 Sep, 2011 2 commits
  11. 22 Sep, 2011 2 commits
  12. 06 Sep, 2011 1 commit
  13. 30 Aug, 2011 2 commits
  14. 26 Aug, 2011 2 commits
  15. 25 Aug, 2011 1 commit
  16. 23 Aug, 2011 2 commits
  17. 22 Aug, 2011 1 commit
  18. 19 Aug, 2011 10 commits
  19. 17 Aug, 2011 1 commit
  20. 15 Aug, 2011 2 commits