1. 07 Sep, 2010 1 commit
  2. 24 Aug, 2010 1 commit
  3. 19 Aug, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Remove lock status field · 9bdab621
      Michael Hanselmann authored
      
      
      With the job queue changes for Ganeti 2.2, watched and queried jobs are
      loaded directly from disk, rendering the in-memory “lock_status” field
      useless. Writing it to disk would be possible, but has a huge cost at
      runtime (when tested, processing 1'000 opcodes involved 4'000 additional
      writes to job files, even with replication turned off).
      
      Using an additional in-memory dictionary to just manage this field turned
      out to be a complicated task due to the necessary locking.
      
      The plan is to introduce a more generic lock debugging mechanism in the
      near future. Hence the decision is to remove this field now instead of
      spending a lot of time to make it working again.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      9bdab621
  4. 18 Aug, 2010 2 commits
  5. 17 Aug, 2010 2 commits
  6. 30 Jul, 2010 1 commit
    • Iustin Pop's avatar
      Fix a few job archival issues · aa9f8167
      Iustin Pop authored
      
      
      This patch fixes two issues with job archival. First, the
      LoadJobFromDisk can return 'None' for no-such-job, and we shouldn't add
      None to the job list; we can't anyway, as this raises an exception:
      
        node1# gnt-job archive foo
        Unhandled protocol error while talking to the master daemon:
        Caught exception: cannot create weak reference to 'NoneType' object
      
      After fixing this, job archival of missing jobs will just continue
      silently, so we modify gnt-job archive to log jobs which were not
      archived and to return exit code 1 for any missing jobs.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      aa9f8167
  7. 29 Jul, 2010 2 commits
    • Iustin Pop's avatar
      Change handling of non-Ganeti errors in jqueue · 599ee321
      Iustin Pop authored
      
      
      Currently, if a job execution raises a Ganeti-specific error (i.e.
      subclass of GenericError), then we encode it as (error class, [error
      args]). This matches the RAPI documentation.
      
      However, if we get a non-Ganeti error, then we encode it as simply
      str(err), a single string. This means that the opresult field is not
      according to the RAPI docs, and thus it's hard to reliably parse the
      job results.
      
      This patch changes the encoding of a failed job (via failure) to always
      be an OpExecError, so that we always encode it properly. For the command
      line interface, the behaviour is the same, as any non-Ganeti errors get
      re-encoded as OpExecError anyway. For the RAPI clients, it only means
      that we always present the same type for results. The actual error value
      is the same, since the err.args is either way str(original_error);
      compare the original (doesn't contain the ValueError):
      
        "opresult": [
          "invalid literal for int(): aa"
        ],
      
      with:
      
        "opresult": [
          [
            "OpExecError",
            [
              "invalid literal for int(): aa"
            ]
          ]
        ],
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      599ee321
    • Michael Hanselmann's avatar
      workerpool: Change signature of AddTask function to not use *args · b2e8a4d9
      Michael Hanselmann authored
      
      
      By changing it to a normal parameter, which must be a sequence, we can
      start using keyword parameters.
      
      Before this patch all arguments to “AddTask(self, *args)” were passed as
      arguments to the worker's “RunTask” method. Priorities, which should be
      optional and will be implemented in a future patch, must be passed as a keyword
      parameter. This means “*args” can no longer be used as one can't combine *args
      and keyword parameters in a clean way:
      
      >>> def f(name=None, *args):
      ...   print "%r, %r" % (args, name)
      ...
      >>> f("p1", "p2", "p3", name="thename")
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       TypeError: f() got multiple values for keyword argument 'name'
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b2e8a4d9
  8. 16 Jul, 2010 1 commit
    • Iustin Pop's avatar
      Implement lock names for debugging purposes · 7f93570a
      Iustin Pop authored
      
      
      This patch adds lock names to SharedLocks and LockSets, that can be used
      later for displaying the actual locks being held/used in places where we
      only have the lock, and not the entire context of the locking operation.
      
      Since I realized that the production code doesn't call LockSet with the
      proper members= syntax, but directly as positional parameters, I've
      converted this (and the arguments to GlobalLockManager) into positional
      arguments.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7f93570a
  9. 15 Jul, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Factorize code waiting for job changes · 989a8bee
      Michael Hanselmann authored
      
      
      By splitting the _WaitForJobChangesHelper class into multiple smaller
      classes, we gain in several places:
      
      - Simpler code, less interaction between functions and variables
      - Easy to unittest (close to 100% coverage)
      - Waiting for job changes has no direct knowledge of queue anymore (it
        doesn't references queue functions anymore, especially not private ones)
      - Activate inotify only if there was no change at the beginning (and
        checking again right away to avoid race conditions)
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      989a8bee
  10. 12 Jul, 2010 1 commit
  11. 09 Jul, 2010 1 commit
  12. 06 Jul, 2010 1 commit
    • Iustin Pop's avatar
      Fix opcode transition from WAITLOCK to RUNNING · 271daef8
      Iustin Pop authored
      
      
      With the recent changes in the job queue, an old bug surfaced: we never
      serialized the status change when in NotifyStart, thus a crash of the
      master would have left the job queue oblivious to the fact that the job
      was actually running.
      
      In the previous implementation, queries against the job status were
      using the in-memory object, so they 'saw' and reported correctly the
      running status. But the new implementation just looks at the on-disk
      version, and thus didn't see this transition.
      
      The patch also moves NotifyStart to a decorator-based version (like the
      other functions), which generates a lot of churn in the diff, sorry.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      271daef8
  13. 28 Jun, 2010 6 commits
  14. 23 Jun, 2010 6 commits
  15. 17 Jun, 2010 4 commits
  16. 15 Jun, 2010 2 commits
  17. 11 Jun, 2010 6 commits
  18. 01 Jun, 2010 1 commit
    • Iustin Pop's avatar
      Add a new opcode timestamp field · b9b5abcb
      Iustin Pop authored
      
      
      Since the current start_timestamp opcode attribute refers to the inital
      start time, before locks are acquired, it's not useful to determine the
      actual execution order of two opcodes/jobs competing for the same lock.
      
      This patch adds a new field, exec_timestamp, that is updated when the
      opcode moves from OP_STATUS_WAITLOCK to OP_STATUS_RUNNING, thus allowing
      a clear view of the execution history. The new field is visible in the
      job output via the 'opexec' field.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      b9b5abcb