1. 06 Nov, 2009 1 commit
  2. 02 Nov, 2009 1 commit
  3. 29 Sep, 2009 1 commit
  4. 25 Sep, 2009 1 commit
  5. 17 Sep, 2009 1 commit
  6. 15 Sep, 2009 2 commits
  7. 31 Aug, 2009 1 commit
  8. 27 Aug, 2009 1 commit
  9. 26 Aug, 2009 3 commits
  10. 25 Aug, 2009 1 commit
  11. 20 Aug, 2009 1 commit
  12. 10 Aug, 2009 1 commit
  13. 29 Jul, 2009 1 commit
  14. 25 Jul, 2009 1 commit
    • Guido Trotter's avatar
      Collapse daemon's main function · 04ccf5e9
      Guido Trotter authored
      
      
      With three ganeti daemons, and one or two more coming, the daemon's main
      function started becoming too much cut&pasted code. Collapsing most of
      it in a daemon.GenericMain function. Some more code could be collapsed
      between the two http-based daemons, but since the new daemons won't be
      http-based we won't do it right now.
      
      As a bonus a functionality for overriding the network port on the
      command line for all network based nodes is added.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      04ccf5e9
  15. 24 Jul, 2009 2 commits
  16. 23 Jul, 2009 1 commit
  17. 19 Jul, 2009 1 commit
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 56d8ff91
      Iustin Pop authored
      
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 2971c913)
      56d8ff91
  18. 14 Jul, 2009 1 commit
    • Guido Trotter's avatar
      ganeti-masterd: avoid SimpleConfigReader · b2890442
      Guido Trotter authored
      
      
      SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
      just get the master name we can use that. This is the only usage of
      SimpleConfigReader currently, but we're not going to delete the class,
      as new usages will come in for ganeti-confd (in 2.1). Using it there,
      though, will make the class even more heavy to load, so it makes sense
      for this simple usage to be converted.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      b2890442
  19. 08 Jul, 2009 2 commits
  20. 07 Jul, 2009 1 commit
  21. 15 Jun, 2009 1 commit
  22. 21 May, 2009 1 commit
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 2971c913
      Iustin Pop authored
      
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2971c913
  23. 04 May, 2009 1 commit
  24. 06 Apr, 2009 2 commits
    • Iustin Pop's avatar
      Disable synchronous (locking) queries · 77921a95
      Iustin Pop authored
      This patch raises an error in the master daemon in case the user
      requests a locking query; accordingly, all clients were modified to send
      only lockless queries. This is short-term fix, for proper fix the
      clients should be modified to submit a job when the user request a
      locking query.
      
      The other approach would be to ignore the flag passed by the client;
      this would be worse as client's wouldn't get at least an error.
      
      The possible impact of this is multiple:
        - some commands could have been not converted, and thus fail; this
          can be remedied easily
        - the consistency of commands is lost; e.g. node failover will not
          lock the node *while we get the node info*, so we could miss some
          data; this is again in the thread of atomic operations which are
          missing in the current model of query-and-act from gnt-* scripts
      
      Reviewed-by: imsnah, ultrotter
      77921a95
    • Iustin Pop's avatar
      Add some more debugging info to masterd · e566ddbd
      Iustin Pop authored
      This patch will log data about queries, which are today completely
      invisible (at the default log level) in the master log file.
      
      Reviewed-by: imsnah
      e566ddbd
  25. 27 Feb, 2009 1 commit
    • Guido Trotter's avatar
      Create runtime dir in bootstrap · 9dae41ad
      Guido Trotter authored
      Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster init
      time. This patch creates it in InitCluster just before hv parameter
      checking. Since the code to make list of directories is already repeated
      twice in the code, and this would be the third time, we abstract it into
      an utils.EnsureDirs function and we call that one from ganti-noded,
      ganeti-masterd and bootstrap.
      
      Reviewed-by: iustinp
      9dae41ad
  26. 12 Feb, 2009 1 commit
    • Iustin Pop's avatar
      master daemon: allow skipping the voting process · 5de4474d
      Iustin Pop authored
      This patch introduces a 'force' mode for the master daemon startup where
      the voting process is not done, but the user has to confirm manually the
      startup (before forking, of course).
      
      Reviewed-by: imsnah
      5de4474d
  27. 04 Feb, 2009 2 commits
    • Iustin Pop's avatar
      Add one new luxi query: cluster info · 66baeccc
      Iustin Pop authored
      This is the last query that RAPI executes via opcodes and is purely
      static (config values only). As such, we can convert it safely to a
      query instead of job.
      
      Reviewed-by: imsnah
      66baeccc
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  28. 21 Jan, 2009 1 commit
    • Iustin Pop's avatar
      Fix some more pylint errors · c979d253
      Iustin Pop authored
      Two are real errors (invalid names) and one is style error (overriding
      name from outer scope).
      
      Reviewed-by: ultrotter
      c979d253
  29. 20 Jan, 2009 1 commit
    • Iustin Pop's avatar
      Update the logging output of job processing · d21d09d6
      Iustin Pop authored
      (this is related to the master daemon log)
      
      Currently it's not possible to follow (in the non-debug runs) the
      logical execution thread of jobs. This is due to the fact that we don't
      log the thread name (so we lose the association of log messages to jobs)
      and we don't log the start/stop of job and opcode execution.
      
      This patch adds a new parameter to utils.SetupLogging that enables
      thread name logging, and promotes some log entries from debug to info.
      With this applied, it's easier to understand which log messages relate
      to which jobs/opcodes.
      
      The patch also moves the "INFO client closed connection" entry to debug
      level, since it's not a very informative log entry.
      
      Reviewed-by: ultrotter
      d21d09d6
  30. 09 Jan, 2009 1 commit
    • Iustin Pop's avatar
      Rework the daemonization sequence · 7d88772a
      Iustin Pop authored
      The current fork+close fds sequence has deficiencies which are hard to
      work around:
        - logging can start logging before we fork (e.g. if we need to emit
          messages related to master checking), and thus use FDs which we
          can't track nicely
        - the queue locks the queue file, and again this fd needs to be kept
          open which is hard from the main loop (and this error is currently
          hidden by the fact that we don't log it)
      
      Given the above, it's much simpler, in case we will fork later, to close
      file descriptors right at the beginning of the program, and in Daemonize
      only close/reopen the stdin/out/err fds.
      
      In addition, we also close() the handlers we remove in SetupLogging so
      that the cleanup is more thorough.
      
      Reviewed-by: imsnah
      7d88772a
  31. 06 Jan, 2009 1 commit
  32. 18 Dec, 2008 1 commit
    • Michael Hanselmann's avatar
      Prevent RPC timeout on auto-archiving jobs · f8ad5591
      Michael Hanselmann authored
      With a large job queue, auto-archiving jobs can take a very long time,
      causing timeouts on the luxi RPC layer. With this change, auto-
      archive returns after half of the RPC timeout has passed. The user
      will see how many jobs are left unchecked.
      
      Reviewed-by: ultrotter
      f8ad5591
  33. 11 Dec, 2008 1 commit
    • Iustin Pop's avatar
      Fix epydoc format warnings · c41eea6e
      Iustin Pop authored
      This patch should fix all outstanding epydoc parsing errors; as such, we
      switch epydoc into verbose mode so that any new errors will be visible.
      
      Reviewed-by: imsnah
      c41eea6e