Skip to content
Snippets Groups Projects
  1. Feb 22, 2010
  2. Jan 22, 2010
  3. Jan 05, 2010
    • Iustin Pop's avatar
      Introduce a Luxi call for GetTags · 7699c3af
      Iustin Pop authored
      
      This changes from submitting jobs to get the tags (in cli scripts) to
      queries, which (since the tags query is a cheap one) should be much
      faster.
      
      The tags queries are already done without locks (in the generic query
      paths for instances/nodes/cluster), so this shouldn't break tags query
      via gnt-* list-tags.
      
      On a small cluster, the runtime of gnt-cluster/gnt-instance list tags
      more than halves; on a big cluster (with many MCs) I expect it to be
      more than 5 times faster. The speed of the tags get is not the main
      gain, it is eliminating a job when a simple query is enough.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      7699c3af
  4. Jan 04, 2010
  5. Oct 13, 2009
  6. Sep 25, 2009
  7. Aug 27, 2009
  8. Aug 26, 2009
  9. Jul 19, 2009
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 56d8ff91
      Iustin Pop authored
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 2971c913)
      56d8ff91
  10. Jul 07, 2009
  11. May 21, 2009
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 2971c913
      Iustin Pop authored
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2971c913
  12. Feb 04, 2009
    • Iustin Pop's avatar
      Add one new luxi query: cluster info · 66baeccc
      Iustin Pop authored
      This is the last query that RAPI executes via opcodes and is purely
      static (config values only). As such, we can convert it safely to a
      query instead of job.
      
      Reviewed-by: imsnah
      66baeccc
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  13. Jan 22, 2009
    • Iustin Pop's avatar
      luxi: close and reopen the socket on errors · 8d5b316c
      Iustin Pop authored
      This is less of an actual issue for regular gnt-* clients, but it's
      easily reproducible with burnin and possible with RAPI (depending on how
      the program uses luxi.Client(s)).
      
      In case of burnin, if we interrupt the client (^C) while it polls the
      job, it will abort and raise an error. After that, burnin issues a
      remove instance job, and at this point, we send the submit job (remove)
      call but the first thing we read from the socket will be the response to
      the previous poll job request, since that was queued already from the
      master.
      
      To solve this, whenever we detect an error in Transport.Call(), we close
      that transport and re-create a new one, to start anew. The other
      alternative would be to introduce a sequence to the protocol, but this
      is something that would be design-level change and it's not recommended
      at this stage.
      
      Reviewed-by: imsnah
      8d5b316c
  14. Jan 20, 2009
  15. Dec 18, 2008
    • Michael Hanselmann's avatar
      Prevent RPC timeout on auto-archiving jobs · f8ad5591
      Michael Hanselmann authored
      With a large job queue, auto-archiving jobs can take a very long time,
      causing timeouts on the luxi RPC layer. With this change, auto-
      archive returns after half of the RPC timeout has passed. The user
      will see how many jobs are left unchecked.
      
      Reviewed-by: ultrotter
      f8ad5591
  16. Oct 16, 2008
    • Iustin Pop's avatar
      Add an interface for the drain flag changes/query · 3ccafd0e
      Iustin Pop authored
      This adds the set/reset in the jqueue and luxi modules, and a way to
      query it in OpQueryConfigValues, and also the comand line interface for
      it:
      $ gnt-cluster queue info
      The drain flag is unset
      $ gnt-cluster queue drain
      $ gnt-cluster queue info
      The drain flag is set
      $ gnt-cluster queue undrain
      $ gnt-cluster queue info
      The drain flag is unset
      
      The choice of making the setting via luxi and not an opcode is that
      opcodes can't be executed when drained, but we don't query via luxi
      since in the future it might become a cluster property as opposed to a
      node one.
      
      Reviewed-by: imsnah
      3ccafd0e
  17. Oct 15, 2008
  18. Oct 06, 2008
    • Iustin Pop's avatar
      Implement job auto-archiving · 07cd723a
      Iustin Pop authored
      This patch adds a new luxi call that implements auto-archiving of jobs
      older than a certain age (or -1 for all completed jobs), and the gnt-job
      command that makes use of this (with 'all' for -1).
      
      Reviewed-by: imsnah
      07cd723a
  19. Oct 01, 2008
    • Michael Hanselmann's avatar
      Add new query to get cluster config values · ae5849b5
      Michael Hanselmann authored
      This can be used to retrieve certain cluster config values from
      within clients.
      
      OpDumpClusterConfig was not used anywhere, hence I'm just reusing
      it. The way ConfigWriter.DumpConfig returned the configuration
      was not thread-safe, anyway (no deepcopy).
      
      Reviewed-by: iustinp
      ae5849b5
  20. Aug 29, 2008
    • Iustin Pop's avatar
      Make WaitForJobChanges deal with long jobs · 5c735209
      Iustin Pop authored
      This patch alters the WaitForJobChanges luxi-RPC call to have a
      configurable timeout, so that the call behaves nicely with long jobs
      that have no update.
      
      We do this by adding a timeout parameter in the RPC call, and returning
      a special constant when the timeout is reached without an update. The
      luxi client will repeatedly call the WaitForJobChanges until it gets a
      real change. The timeout is hardcoded as half the RWTO value.
      
      The patch also removes an unused variable (new_state) from the
      WaitForJobChanges method.
      
      Reviewed-by: imsnah,ultrotter
      5c735209
  21. Aug 28, 2008
  22. Aug 27, 2008
    • Michael Hanselmann's avatar
      Make sure that client programs get all messages · 6c5a7090
      Michael Hanselmann authored
      This is a large patch, but I can't figure out how to split it without
      breaking stuff. The old way of getting messages by always getting the
      last one didn't bring all messages to the client if they were added
      too fast, thereby making commands like “gnt-cluster verify” less than
      useful. These changes now introduce some sort a serial number per
      log entry to keep track what message a client already received. They
      also remove the log lock per opcode to make reading log entries thread
      safe.
      
      Reviewed-by: ultrotter
      6c5a7090
  23. Aug 11, 2008
  24. Aug 08, 2008
  25. Aug 06, 2008
  26. Jul 30, 2008
    • Iustin Pop's avatar
      Fix pylint-detected issues · 38206f3c
      Iustin Pop authored
      This is mostly:
        - whitespace fix (space at EOL in some files, not all, broken
          indentation, etc)
        - variable names overriding others (one is a real bug in there)
        - too-long-lines
        - cleanup of most unused imports (not all)
      
      Reviewed-by: ultrotter
      38206f3c
  27. Jul 09, 2008
  28. Jul 08, 2008
  29. Jun 21, 2008
    • Iustin Pop's avatar
      Implement handling of luxi errors in cli.py · 03a8dbdc
      Iustin Pop authored
      Currently the generic handling of ganeti errors in cli.py (GenericMain
      and FormatError) only handles the core ganeti errors, and not the client
      protocol errors (which live in a separate hierarchy).
      
      This patch adds handling of luxi errors too, and also adds another luxi
      error for the case when the master is not running. This gives us a nice:
      
        gnta1:~# gnt-node list
        Cannot communicate with the master daemon.
        Is it running and listening on '/var/run/ganeti-master.sock'?
      
      error message instead of a traceback.
      
      Reviewed-by: amishchenko
      03a8dbdc
  30. Apr 10, 2008
    • Iustin Pop's avatar
      Change client protocol to raise exception on failures · b77acb3e
      Iustin Pop authored
      Currently the luxi.client.SubmitJob and Query methods return the unserialized
      result without processing it at all. This patch changes this by adding a
      'RequestException' error that is raised if the query itself or the
      submission of the job failed, and (if not) returning only the 'result'
      field from the message.
      
      The patch also does processing on the result of a query if we queried
      for jobs, as the 'op_list' field in the result has serialized opcodes
      and we need the de-serialized.
      
      Reviewed-by: ultrotter
      b77acb3e
  31. Apr 07, 2008
    • Iustin Pop's avatar
      Move some checks from cli.py to luxi.py · a14a17fc
      Iustin Pop authored
      The idea of cli.py and luxi.py is that all protocol checks should be in
      luxi, and cli.py should just offer some helpful shortcuts for the
      command line scripts.
      
      This patch removes the result checks from cli and adds some other checks
      to luxi. It does no longer check the success/failure since it's not yet
      clear how that should be handled - probably exceptions.
      
      Reviewed-by: ultrotter
      a14a17fc
  32. Apr 01, 2008
    • Iustin Pop's avatar
      Add submit function to lib/cli.py · ceab32dd
      Iustin Pop authored
      This patch adds function that submit jobs or queries over the unix socket
      interface to lib/cli.py. The will be used by the scripts instead of the
      SubmitOpCode function.
      
      Reviewed-by: ultrotter
      ceab32dd
Loading