1. 12 Aug, 2009 1 commit
  2. 10 Aug, 2009 1 commit
  3. 06 Aug, 2009 1 commit
  4. 05 Aug, 2009 2 commits
  5. 04 Aug, 2009 1 commit
  6. 29 Jul, 2009 1 commit
  7. 26 Jul, 2009 1 commit
  8. 25 Jul, 2009 1 commit
    • Guido Trotter's avatar
      Collapse daemon's main function · 04ccf5e9
      Guido Trotter authored
      
      
      With three ganeti daemons, and one or two more coming, the daemon's main
      function started becoming too much cut&pasted code. Collapsing most of
      it in a daemon.GenericMain function. Some more code could be collapsed
      between the two http-based daemons, but since the new daemons won't be
      http-based we won't do it right now.
      
      As a bonus a functionality for overriding the network port on the
      command line for all network based nodes is added.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      04ccf5e9
  9. 24 Jul, 2009 4 commits
  10. 23 Jul, 2009 3 commits
  11. 22 Jul, 2009 1 commit
  12. 19 Jul, 2009 1 commit
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 56d8ff91
      Iustin Pop authored
      
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 2971c913)
      56d8ff91
  13. 14 Jul, 2009 1 commit
    • Guido Trotter's avatar
      ganeti-masterd: avoid SimpleConfigReader · b2890442
      Guido Trotter authored
      
      
      SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
      just get the master name we can use that. This is the only usage of
      SimpleConfigReader currently, but we're not going to delete the class,
      as new usages will come in for ganeti-confd (in 2.1). Using it there,
      though, will make the class even more heavy to load, so it makes sense
      for this simple usage to be converted.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      b2890442
  14. 08 Jul, 2009 2 commits
  15. 07 Jul, 2009 1 commit
  16. 29 Jun, 2009 1 commit
  17. 15 Jun, 2009 12 commits
  18. 09 Jun, 2009 1 commit
    • Iustin Pop's avatar
      rpc: Add a simple failure reporting framework · 2cc6781a
      Iustin Pop authored
      
      
      This patch adds a simple failure reporting tool, similar to bdev's
      _ThrowError. In backend, we move towards the new-style RPC results (of
      type (status, payload)) and thus functions which use this style can very
      easily log and return the error message using this new function.
      
      The exception is declared here and not in errors.py since it's local to
      the node-daemon/backend combination.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2cc6781a
  19. 27 May, 2009 1 commit
    • Iustin Pop's avatar
      Add a node powercycle command · f5118ade
      Iustin Pop authored
      
      
      This (somewhat big) patch adds support for remotely rebooting the nodes
      via whatever support the hypervisor has for such a concept.
      
      For KVM/fake (and containers in the future) this just uses sysrq plus a
      ‘reboot’ call if the sysrq method failed. For Xen, it first tries the
      above, and then Xen-hypervisor reboot (we first try sysrq since that
      just requires opening a file handle, whereas xen reboot means launching
      an external utility).
      
      The user interface is:
      
          # gnt-node powercycle node5
          Are you sure you want to hard powercycle node node5?
          y/[n]/?: y
          Reboot scheduled in 5 seconds
      
      The node reboots hopefully after sending the reply. In case the clock is
      broken, “time.sleep(5)” might take ages (but then I suspect SSL
      negotiation wouldn't work).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      f5118ade
  20. 25 May, 2009 2 commits
    • Iustin Pop's avatar
      watcher: automatically restart noded/rapi · c4f0219c
      Iustin Pop authored
      
      
      This patch makes the watcher automatically restart the node and rapi
      daemons, if they are not running (as per the PID file).
      
      This is not an exhaustive test; a better one would be TCP connect to the
      port, and an even better one a simple protocol ping (e.g. get / for rapi
      and a rpc_call_alive for noded), but since we don't know how they've
      been started we can't implement it today. rapi would need to write the
      SSL/port to a file, and noded something similar, so that we know how to
      connect.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      c4f0219c
    • Iustin Pop's avatar
      watcher: handle full and drained queue cases · 24edc6d4
      Iustin Pop authored
      
      
      Currently the watcher is broken when the queue is full, thus not
      fulfilling its job as a queue cleaner. It also doesn't handle nicely the
      queue drained status.
      
      This patch does a few changes:
        - first archive jobs, and only after submit jobs; this fixes the case
          where the queue is already full and there are jobs suited for
          archiving (but not the case where the jobs all too young to be
          archived)
        - handle nicely the job queue full and drained cases—instead of
          tracebacks, log such cases nicely
        - reverse the initial value and special cases for update_file; we now
          whitelist instead of blacklist cases, since we have much more
          blacklist cases than vice versa, and we set the flag to True only
          after the run is successful
      
      The last change, especially, is a significant one: now errors during the
      watcher run will not update the status file, and thus they won't be lost
      again in the logs.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      24edc6d4
  21. 21 May, 2009 1 commit
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 2971c913
      Iustin Pop authored
      
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2971c913