Skip to content
Snippets Groups Projects
  1. Aug 31, 2009
  2. Aug 27, 2009
  3. Aug 26, 2009
  4. Aug 25, 2009
  5. Aug 20, 2009
  6. Aug 10, 2009
  7. Jul 29, 2009
  8. Jul 25, 2009
    • Guido Trotter's avatar
      Collapse daemon's main function · 04ccf5e9
      Guido Trotter authored
      
      With three ganeti daemons, and one or two more coming, the daemon's main
      function started becoming too much cut&pasted code. Collapsing most of
      it in a daemon.GenericMain function. Some more code could be collapsed
      between the two http-based daemons, but since the new daemons won't be
      http-based we won't do it right now.
      
      As a bonus a functionality for overriding the network port on the
      command line for all network based nodes is added.
      
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      04ccf5e9
  9. Jul 24, 2009
  10. Jul 23, 2009
  11. Jul 19, 2009
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 56d8ff91
      Iustin Pop authored
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 2971c913)
      56d8ff91
  12. Jul 14, 2009
    • Guido Trotter's avatar
      ganeti-masterd: avoid SimpleConfigReader · b2890442
      Guido Trotter authored
      
      SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
      just get the master name we can use that. This is the only usage of
      SimpleConfigReader currently, but we're not going to delete the class,
      as new usages will come in for ganeti-confd (in 2.1). Using it there,
      though, will make the class even more heavy to load, so it makes sense
      for this simple usage to be converted.
      
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      b2890442
  13. Jul 08, 2009
  14. Jul 07, 2009
  15. Jun 15, 2009
  16. May 21, 2009
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 2971c913
      Iustin Pop authored
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2971c913
  17. May 04, 2009
  18. Apr 06, 2009
    • Iustin Pop's avatar
      Disable synchronous (locking) queries · 77921a95
      Iustin Pop authored
      This patch raises an error in the master daemon in case the user
      requests a locking query; accordingly, all clients were modified to send
      only lockless queries. This is short-term fix, for proper fix the
      clients should be modified to submit a job when the user request a
      locking query.
      
      The other approach would be to ignore the flag passed by the client;
      this would be worse as client's wouldn't get at least an error.
      
      The possible impact of this is multiple:
        - some commands could have been not converted, and thus fail; this
          can be remedied easily
        - the consistency of commands is lost; e.g. node failover will not
          lock the node *while we get the node info*, so we could miss some
          data; this is again in the thread of atomic operations which are
          missing in the current model of query-and-act from gnt-* scripts
      
      Reviewed-by: imsnah, ultrotter
      77921a95
    • Iustin Pop's avatar
      Add some more debugging info to masterd · e566ddbd
      Iustin Pop authored
      This patch will log data about queries, which are today completely
      invisible (at the default log level) in the master log file.
      
      Reviewed-by: imsnah
      e566ddbd
  19. Feb 27, 2009
    • Guido Trotter's avatar
      Create runtime dir in bootstrap · 9dae41ad
      Guido Trotter authored
      Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster init
      time. This patch creates it in InitCluster just before hv parameter
      checking. Since the code to make list of directories is already repeated
      twice in the code, and this would be the third time, we abstract it into
      an utils.EnsureDirs function and we call that one from ganti-noded,
      ganeti-masterd and bootstrap.
      
      Reviewed-by: iustinp
      9dae41ad
  20. Feb 12, 2009
    • Iustin Pop's avatar
      master daemon: allow skipping the voting process · 5de4474d
      Iustin Pop authored
      This patch introduces a 'force' mode for the master daemon startup where
      the voting process is not done, but the user has to confirm manually the
      startup (before forking, of course).
      
      Reviewed-by: imsnah
      5de4474d
  21. Feb 04, 2009
    • Iustin Pop's avatar
      Add one new luxi query: cluster info · 66baeccc
      Iustin Pop authored
      This is the last query that RAPI executes via opcodes and is purely
      static (config values only). As such, we can convert it safely to a
      query instead of job.
      
      Reviewed-by: imsnah
      66baeccc
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  22. Jan 21, 2009
    • Iustin Pop's avatar
      Fix some more pylint errors · c979d253
      Iustin Pop authored
      Two are real errors (invalid names) and one is style error (overriding
      name from outer scope).
      
      Reviewed-by: ultrotter
      c979d253
  23. Jan 20, 2009
    • Iustin Pop's avatar
      Update the logging output of job processing · d21d09d6
      Iustin Pop authored
      (this is related to the master daemon log)
      
      Currently it's not possible to follow (in the non-debug runs) the
      logical execution thread of jobs. This is due to the fact that we don't
      log the thread name (so we lose the association of log messages to jobs)
      and we don't log the start/stop of job and opcode execution.
      
      This patch adds a new parameter to utils.SetupLogging that enables
      thread name logging, and promotes some log entries from debug to info.
      With this applied, it's easier to understand which log messages relate
      to which jobs/opcodes.
      
      The patch also moves the "INFO client closed connection" entry to debug
      level, since it's not a very informative log entry.
      
      Reviewed-by: ultrotter
      d21d09d6
  24. Jan 09, 2009
    • Iustin Pop's avatar
      Rework the daemonization sequence · 7d88772a
      Iustin Pop authored
      The current fork+close fds sequence has deficiencies which are hard to
      work around:
        - logging can start logging before we fork (e.g. if we need to emit
          messages related to master checking), and thus use FDs which we
          can't track nicely
        - the queue locks the queue file, and again this fd needs to be kept
          open which is hard from the main loop (and this error is currently
          hidden by the fact that we don't log it)
      
      Given the above, it's much simpler, in case we will fork later, to close
      file descriptors right at the beginning of the program, and in Daemonize
      only close/reopen the stdin/out/err fds.
      
      In addition, we also close() the handlers we remove in SetupLogging so
      that the cleanup is more thorough.
      
      Reviewed-by: imsnah
      7d88772a
  25. Jan 06, 2009
  26. Dec 18, 2008
    • Michael Hanselmann's avatar
      Prevent RPC timeout on auto-archiving jobs · f8ad5591
      Michael Hanselmann authored
      With a large job queue, auto-archiving jobs can take a very long time,
      causing timeouts on the luxi RPC layer. With this change, auto-
      archive returns after half of the RPC timeout has passed. The user
      will see how many jobs are left unchecked.
      
      Reviewed-by: ultrotter
      f8ad5591
  27. Dec 11, 2008
    • Iustin Pop's avatar
      Fix epydoc format warnings · c41eea6e
      Iustin Pop authored
      This patch should fix all outstanding epydoc parsing errors; as such, we
      switch epydoc into verbose mode so that any new errors will be visible.
      
      Reviewed-by: imsnah
      c41eea6e
  28. Dec 02, 2008
    • Iustin Pop's avatar
      Fix master failover · bbe19c17
      Iustin Pop authored
      The ssconf files were not updated by the master failover. We need to
      push them, and since we already have RPC initialized, we can use the
      standard ConfigWriter to do so - this will take care of both the config
      file and the ssconf files.
      
      Reviewed-by: imsnah
      bbe19c17
  29. Nov 26, 2008
  30. Nov 25, 2008
    • Guido Trotter's avatar
      Move the MASTER_SOCKET to SOCKET_DIR · 227647ac
      Guido Trotter authored
      Before it was in the abstract linux namespace, where unfortunately we
      couldn't easily check from python the credentials of the connecting
      clients. Now we also have to remove the file on exit and when starting.
      
      Reviewed-by: imsnah
      227647ac
    • Guido Trotter's avatar
      ganeti-masterd: create SOCKET_DIR · d823660a
      Guido Trotter authored
      If SOCKET_DIR doesn't exist we create it in the master daemon, before
      trying to put a socket inside it.
      
      Reviewed-by: imsnah
      d823660a
  31. Nov 21, 2008
    • Michael Hanselmann's avatar
      ganeti-masterd: Remove PID file at the end · 15486fa7
      Michael Hanselmann authored
      Removing the PID file should be the last thing done. This patch makes
      sure it's also removed when master.server_cleanup() throws an exception.
      
      Also initialize logging only after writing the PID file.
      
      Reviewed-by: iustinp
      15486fa7
    • Michael Hanselmann's avatar
      Reuse HTTP client pool for RPC · 4331f6cd
      Michael Hanselmann authored
      ganeti-masterd: Add initialization and shutdown of RPC pool. It needs
      to be shutdown before forking.
      
      ganeti.cli: Add decorator function to initialize and shutdown RPC pool.
      
      ganeti.rpc: Add functions to initialize and shutdown RPC pool. Throw
      exception when used without proper initialization.
      
      gnt-cluster, gnt-node: Use decorator function to initialize and shutdown
      RPC pool.
      
      Reviewed-by: iustinp
      4331f6cd
  32. Oct 20, 2008
    • Iustin Pop's avatar
      Convert the job queue rpcs to address-based · 99aabbed
      Iustin Pop authored
      The two main multi-node job queue RPC calls (jobqueue_update,
      jobqueue_rename) are converted to address-based calls, in order to speed
      up queue changes. For this, we need to change the _nodes attribute on
      the jobqueue to be a dict {name: ip}, instead of a set.
      
      Reviewed-by: imsnah
      99aabbed
Loading