1. 12 Oct, 2011 2 commits
    • Iustin Pop's avatar
      Standardise LUXI call argument types · a629ecb9
      Iustin Pop authored
      
      
      Currently, we have 4 types of arguments in LUXI calls:
      
      - most common, a list of values
      - a single argument that is sent as a list of one element
      - a single argument that is sent by itself
      - a dictionary (only Query and QueryFields)
      
      This inconsistency makes it not only harder to auto-generate the
      HTools LUXI interface, but also in general to check the arguments and
      (if we ever want to do it) auto-generate the Python LUXI client.
      
      Compare this with the node daemon, which uses consistently a list for
      its arguments, and even with way more changes over time had no issues
      with extending the interface.
      
      In case we want to extend a call, there are two options:
      
      - preferred: add a new call, keep the old one unchanged
      - possible: add further parameters to the current argument list
      
      The patch against HTools will follow—sending separately as the Python
      changes are very clear by themselves.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a629ecb9
    • Iustin Pop's avatar
      Rename filter and filter_ to qfilter · 2e5c33db
      Iustin Pop authored
      
      
      We currently use 'filter' as the OpCode, QueryRequest and RAPI field
      name for representing a query filter. However, since 'filter' is a
      built-in function, we actually have to use filter_ throughout the code
      in order to not override the built-in function.
      
      This patch simply goes and does a global sed over the code. Due to the
      fact that the RAPI interface already exposed this field, we add
      compatibility code for now which handles both forms.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      2e5c33db
  2. 30 Aug, 2011 1 commit
  3. 02 May, 2011 1 commit
  4. 15 Mar, 2011 1 commit
  5. 07 Jan, 2011 1 commit
  6. 06 Jan, 2011 1 commit
  7. 20 Dec, 2010 1 commit
  8. 13 Dec, 2010 1 commit
  9. 01 Dec, 2010 1 commit
  10. 01 Nov, 2010 1 commit
  11. 28 Oct, 2010 2 commits
  12. 24 Aug, 2010 1 commit
    • Michael Hanselmann's avatar
      Add simple lock monitor · 19b9ba9a
      Michael Hanselmann authored
      
      
      This patch adds an initial implementation of a lock monitor, accessible
      for the user through “gnt-debug locks”. It currently shows all resource
      locks: BGL, nodes and instances. Config and job queue locks could be
      shown too, but wouldn't be of much help.  The current owner(s) and mode
      are also shown.
      
      Showing pending acquires will require further changes on the SharedLock
      internals and is not yet implemented.
      
      Example output:
      $ gnt-debug locks -o name,mode,owner
      Name            Mode      Owner
      BGL/BGL         shared    JobQueue19/Job147
      instances/inst1 exclusive JobQueue19/Job147
      instances/inst2 -         -
      instances/inst3 -         -
      instances/inst4 -         -
      nodes/node1     exclusive JobQueue19/Job147
      nodes/node2     exclusive JobQueue19/Job147
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      19b9ba9a
  13. 28 Jul, 2010 1 commit
  14. 18 May, 2010 1 commit
    • Guido Trotter's avatar
      Abstract the LUXI eom into a constant · 25942a6c
      Guido Trotter authored
      
      
      Currently the EOM terminator is hardcoded on the server side, and is
      customizable in the Transport object (with the default being the same as
      the value found in the server), but not in the luxi client.
      
      With this patch we move the value to constants, and remove the "fake"
      customizability, which would just break client/server communication. If
      we ever need to have a luxi transport with a different terminator it's
      easy enough to add it back.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      25942a6c
  15. 11 May, 2010 1 commit
  16. 22 Feb, 2010 2 commits
  17. 22 Jan, 2010 2 commits
  18. 05 Jan, 2010 1 commit
    • Iustin Pop's avatar
      Introduce a Luxi call for GetTags · 7699c3af
      Iustin Pop authored
      
      
      This changes from submitting jobs to get the tags (in cli scripts) to
      queries, which (since the tags query is a cheap one) should be much
      faster.
      
      The tags queries are already done without locks (in the generic query
      paths for instances/nodes/cluster), so this shouldn't break tags query
      via gnt-* list-tags.
      
      On a small cluster, the runtime of gnt-cluster/gnt-instance list tags
      more than halves; on a big cluster (with many MCs) I expect it to be
      more than 5 times faster. The speed of the tags get is not the main
      gain, it is eliminating a job when a simple query is enough.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      7699c3af
  19. 04 Jan, 2010 1 commit
  20. 13 Oct, 2009 1 commit
  21. 25 Sep, 2009 1 commit
  22. 27 Aug, 2009 1 commit
  23. 26 Aug, 2009 1 commit
  24. 19 Jul, 2009 1 commit
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 56d8ff91
      Iustin Pop authored
      
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 2971c913)
      56d8ff91
  25. 07 Jul, 2009 3 commits
  26. 21 May, 2009 1 commit
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 2971c913
      Iustin Pop authored
      
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2971c913
  27. 04 Feb, 2009 2 commits
    • Iustin Pop's avatar
      Add one new luxi query: cluster info · 66baeccc
      Iustin Pop authored
      This is the last query that RAPI executes via opcodes and is purely
      static (config values only). As such, we can convert it safely to a
      query instead of job.
      
      Reviewed-by: imsnah
      66baeccc
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  28. 22 Jan, 2009 1 commit
    • Iustin Pop's avatar
      luxi: close and reopen the socket on errors · 8d5b316c
      Iustin Pop authored
      This is less of an actual issue for regular gnt-* clients, but it's
      easily reproducible with burnin and possible with RAPI (depending on how
      the program uses luxi.Client(s)).
      
      In case of burnin, if we interrupt the client (^C) while it polls the
      job, it will abort and raise an error. After that, burnin issues a
      remove instance job, and at this point, we send the submit job (remove)
      call but the first thing we read from the socket will be the response to
      the previous poll job request, since that was queued already from the
      master.
      
      To solve this, whenever we detect an error in Transport.Call(), we close
      that transport and re-create a new one, to start anew. The other
      alternative would be to introduce a sequence to the protocol, but this
      is something that would be design-level change and it's not recommended
      at this stage.
      
      Reviewed-by: imsnah
      8d5b316c
  29. 20 Jan, 2009 1 commit
  30. 18 Dec, 2008 1 commit
    • Michael Hanselmann's avatar
      Prevent RPC timeout on auto-archiving jobs · f8ad5591
      Michael Hanselmann authored
      With a large job queue, auto-archiving jobs can take a very long time,
      causing timeouts on the luxi RPC layer. With this change, auto-
      archive returns after half of the RPC timeout has passed. The user
      will see how many jobs are left unchecked.
      
      Reviewed-by: ultrotter
      f8ad5591
  31. 16 Oct, 2008 1 commit
    • Iustin Pop's avatar
      Add an interface for the drain flag changes/query · 3ccafd0e
      Iustin Pop authored
      This adds the set/reset in the jqueue and luxi modules, and a way to
      query it in OpQueryConfigValues, and also the comand line interface for
      it:
      $ gnt-cluster queue info
      The drain flag is unset
      $ gnt-cluster queue drain
      $ gnt-cluster queue info
      The drain flag is set
      $ gnt-cluster queue undrain
      $ gnt-cluster queue info
      The drain flag is unset
      
      The choice of making the setting via luxi and not an opcode is that
      opcodes can't be executed when drained, but we don't query via luxi
      since in the future it might become a cluster property as opposed to a
      node one.
      
      Reviewed-by: imsnah
      3ccafd0e
  32. 15 Oct, 2008 1 commit
  33. 06 Oct, 2008 1 commit
    • Iustin Pop's avatar
      Implement job auto-archiving · 07cd723a
      Iustin Pop authored
      This patch adds a new luxi call that implements auto-archiving of jobs
      older than a certain age (or -1 for all completed jobs), and the gnt-job
      command that makes use of this (with 'all' for -1).
      
      Reviewed-by: imsnah
      07cd723a