1. 18 Aug, 2010 1 commit
  2. 30 Jul, 2010 1 commit
    • Iustin Pop's avatar
      Fix a few job archival issues · aa9f8167
      Iustin Pop authored
      This patch fixes two issues with job archival. First, the
      LoadJobFromDisk can return 'None' for no-such-job, and we shouldn't add
      None to the job list; we can't anyway, as this raises an exception:
        node1# gnt-job archive foo
        Unhandled protocol error while talking to the master daemon:
        Caught exception: cannot create weak reference to 'NoneType' object
      After fixing this, job archival of missing jobs will just continue
      silently, so we modify gnt-job archive to log jobs which were not
      archived and to return exit code 1 for any missing jobs.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
  3. 29 Jul, 2010 2 commits
    • Iustin Pop's avatar
      Change handling of non-Ganeti errors in jqueue · 599ee321
      Iustin Pop authored
      Currently, if a job execution raises a Ganeti-specific error (i.e.
      subclass of GenericError), then we encode it as (error class, [error
      args]). This matches the RAPI documentation.
      However, if we get a non-Ganeti error, then we encode it as simply
      str(err), a single string. This means that the opresult field is not
      according to the RAPI docs, and thus it's hard to reliably parse the
      job results.
      This patch changes the encoding of a failed job (via failure) to always
      be an OpExecError, so that we always encode it properly. For the command
      line interface, the behaviour is the same, as any non-Ganeti errors get
      re-encoded as OpExecError anyway. For the RAPI clients, it only means
      that we always present the same type for results. The actual error value
      is the same, since the err.args is either way str(original_error);
      compare the original (doesn't contain the ValueError):
        "opresult": [
          "invalid literal for int(): aa"
        "opresult": [
              "invalid literal for int(): aa"
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
    • Michael Hanselmann's avatar
      workerpool: Change signature of AddTask function to not use *args · b2e8a4d9
      Michael Hanselmann authored
      By changing it to a normal parameter, which must be a sequence, we can
      start using keyword parameters.
      Before this patch all arguments to “AddTask(self, *args)” were passed as
      arguments to the worker's “RunTask” method. Priorities, which should be
      optional and will be implemented in a future patch, must be passed as a keyword
      parameter. This means “*args” can no longer be used as one can't combine *args
      and keyword parameters in a clean way:
      >>> def f(name=None, *args):
      ...   print "%r, %r" % (args, name)
      >>> f("p1", "p2", "p3", name="thename")
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       TypeError: f() got multiple values for keyword argument 'name'
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  4. 16 Jul, 2010 1 commit
    • Iustin Pop's avatar
      Implement lock names for debugging purposes · 7f93570a
      Iustin Pop authored
      This patch adds lock names to SharedLocks and LockSets, that can be used
      later for displaying the actual locks being held/used in places where we
      only have the lock, and not the entire context of the locking operation.
      Since I realized that the production code doesn't call LockSet with the
      proper members= syntax, but directly as positional parameters, I've
      converted this (and the arguments to GlobalLockManager) into positional
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
  5. 15 Jul, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Factorize code waiting for job changes · 989a8bee
      Michael Hanselmann authored
      By splitting the _WaitForJobChangesHelper class into multiple smaller
      classes, we gain in several places:
      - Simpler code, less interaction between functions and variables
      - Easy to unittest (close to 100% coverage)
      - Waiting for job changes has no direct knowledge of queue anymore (it
        doesn't references queue functions anymore, especially not private ones)
      - Activate inotify only if there was no change at the beginning (and
        checking again right away to avoid race conditions)
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
  6. 12 Jul, 2010 1 commit
  7. 09 Jul, 2010 1 commit
  8. 06 Jul, 2010 1 commit
    • Iustin Pop's avatar
      Fix opcode transition from WAITLOCK to RUNNING · 271daef8
      Iustin Pop authored
      With the recent changes in the job queue, an old bug surfaced: we never
      serialized the status change when in NotifyStart, thus a crash of the
      master would have left the job queue oblivious to the fact that the job
      was actually running.
      In the previous implementation, queries against the job status were
      using the in-memory object, so they 'saw' and reported correctly the
      running status. But the new implementation just looks at the on-disk
      version, and thus didn't see this transition.
      The patch also moves NotifyStart to a decorator-based version (like the
      other functions), which generates a lot of churn in the diff, sorry.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
  9. 28 Jun, 2010 6 commits
  10. 23 Jun, 2010 6 commits
  11. 17 Jun, 2010 4 commits
  12. 15 Jun, 2010 2 commits
  13. 11 Jun, 2010 6 commits
  14. 01 Jun, 2010 1 commit
    • Iustin Pop's avatar
      Add a new opcode timestamp field · b9b5abcb
      Iustin Pop authored
      Since the current start_timestamp opcode attribute refers to the inital
      start time, before locks are acquired, it's not useful to determine the
      actual execution order of two opcodes/jobs competing for the same lock.
      This patch adds a new field, exec_timestamp, that is updated when the
      opcode moves from OP_STATUS_WAITLOCK to OP_STATUS_RUNNING, thus allowing
      a clear view of the execution history. The new field is visible in the
      job output via the 'opexec' field.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
  15. 08 Mar, 2010 2 commits
  16. 13 Jan, 2010 4 commits