Skip to content
Snippets Groups Projects
  1. Jul 19, 2009
    • Iustin Pop's avatar
      Merge commit 'origin/next' into branch-2.1 · 25f9901f
      Iustin Pop authored
      Conflicts:
      	lib/cli.py: trivial extra empty line
      25f9901f
    • Iustin Pop's avatar
      Fix gnt-instance reinstall · b8f31860
      Iustin Pop authored
      
      Commit 55efe6da "Convert instance
      reinstall to multi instance model" actually broke instance reinstall for
      single-instance cases. This one-liner fixes it.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit b6e243ab)
      b8f31860
    • Iustin Pop's avatar
      Fix a couple of epydoc warnings · 6af6270a
      Iustin Pop authored
      
      It seems epydoc needs fully-qualified references, and doesn't deal with
      relative ones (not even in the current module) if there are any
      ambiguities.
      
      There are other epydoc warnings, in the rapi docstrings, but those are
      left as-is as they're removed in 2.1.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      6af6270a
    • Iustin Pop's avatar
      job queue: fix loss of finalized opcode result · 34327f51
      Iustin Pop authored
      
      Currently, unclean master daemon shutdown overwrites all of a job's
      opcode status and result with error/None. This is incorrect, since the
      any already finished opcode(s) should have their status and result
      preserved, and only not-yet-processed opcodes should be marked as
      ‘error’. Cancelling jobs between opcodes does the same (but this is not
      allowed currently by the code, so it's not as important as unclean
      shutdown).
      
      This patch adds a new _QueuedJob function that only overwrites the
      status and result of finalized opcodes, which is then used in job queue
      init and in the cancel job functions. The patch also adds some comments
      and a new set constants in constants.py highlighting the finalized vs.
      non-finalized opcode statuses.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      34327f51
    • Iustin Pop's avatar
      Switch gnt-debug submit-job to JobExecutor · b59252fe
      Iustin Pop authored
      
      Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor
      uses the optimized SubmitManyJobs luxi call and as such should be used
      whenever multiple jobs need to be submitted.
      
      This patch converts gnt-debug submit-job to use it and also removes an
      extra empty line in the JobExecutor class.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      b59252fe
    • Iustin Pop's avatar
      Convert instance reinstall to multi instance model · 3d2ca95d
      Iustin Pop authored
      
      This patch converts ‘gnt-instance reinstall’ from single-instance to
      multi-instance model; since this is dangerours, it's required to pass
      “--force --force-multiple” to skip the confirmation.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 55efe6da)
      3d2ca95d
    • Iustin Pop's avatar
      gnt-instance batch-create: use the job executor · dd7dcca7
      Iustin Pop authored
      
      This small patch changed the batch create functionality to use the job
      executor instead of single-job submits.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit d4dd4b74)
      dd7dcca7
    • Iustin Pop's avatar
      Modify cli.JobExecutor to use SubmitManyJobs · f2921752
      Iustin Pop authored
      
      This patch changes the generic "multiple job executor" to use the many
      jobs submit model, which automatically makes all its users use the new
      model.
      
      This makes, for example, startup/shutdown of a full cluster much more
      logical (all the submitted job IDs are visible fast, and then waiting
      for them proceeds normally).
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 23b4b983)
      f2921752
    • Iustin Pop's avatar
      Add a luxi call for multi-job submit · 56d8ff91
      Iustin Pop authored
      
      As a workaround for the job submit timeouts that we have, this patch
      adds a new luxi call for multi-job submit; the advantage is that all the
      jobs are added in the queue and only after the workers can start
      processing them.
      
      This is definitely faster than per-job submit, where the submission of
      new jobs competes with the workers processing jobs.
      
      On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
        - 100 jobs:
          - individual: submit time ~21s, processing time ~21s
          - multiple:   submit time 7-9s, processing time ~22s
        - 250 jobs:
          - individual: submit time ~56s, processing time ~57s
                        run 2:      ~54s                  ~55s
          - multiple:   submit time ~20s, processing time ~51s
                        run 2:      ~17s                  ~52s
      
      which shows that we indeed gain on the client side, and maybe even on
      the total processing time for a high number of jobs. For just 10 or so I
      expect the difference to be just noise.
      
      This will probably require increasing the timeout a little when
      submitting too many jobs - 250 jobs at ~20 seconds is close to the
      current rw timeout of 60s.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit 2971c913)
      56d8ff91
    • Iustin Pop's avatar
      job queue: fix interrupted job processing · f6424741
      Iustin Pop authored
      
      If a job with more than one opcodes is being processed, and the master
      daemon crashes between two opcodes, we have the first N opcodes marked
      successful, and the rest marked as queued. This means that the overall
      jbo status is queued, and thus on master daemon restart it will be
      resent for completion.
      
      However, the RunTask() function in jqueue.py doesn't deal with
      partially-completed jobs. This patch makes it simply skip such opcodes.
      
      An alternative option would be to not mark partially-completed jobs as
      QUEUED but instead RUNNING, which would result in aborting of the job at
      restart time.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      f6424741
    • Iustin Pop's avatar
      Fix an error path in job queue worker's RunTask · ed21712b
      Iustin Pop authored
      
      In case the job fails, we try to set the job's run_op_idx to -1.
      However, this is a wrong variable, which wasn't detected until the
      __slots__ addition. The correct variable is run_op_index.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      ed21712b
  2. Jul 17, 2009
  3. Jul 16, 2009
  4. Jul 14, 2009
  5. Jul 13, 2009
Loading