1. 27 Aug, 2008 4 commits
    • Michael Hanselmann's avatar
      jqueue: Replace normal cache dict with weakref dict · 5685c1a5
      Michael Hanselmann authored
      A job should only exist once in memory. After the cache is cleaned,
      there can still be references to a job somewhere else. If there
      are multiple instances, one can get updated while a function is
      waiting for changes on another instance. By using
      weakref.WeakValueDictionary, which automatically removes instances as
      soon as there are no strong references to it anymore, we can solve
      this problem.
      
      Reviewed-by: iustinp
      5685c1a5
    • Michael Hanselmann's avatar
      jqueue: Keep timestamp of opcode start and end · 70552c46
      Michael Hanselmann authored
      Reviewed-by: ultrotter
      70552c46
    • Michael Hanselmann's avatar
      jqueue: Reset run_op_idx after job is done · 65548ed5
      Michael Hanselmann authored
      It can be confusing otherwise.
      
      Reviewed-by: ultrotter
      65548ed5
    • Michael Hanselmann's avatar
      Make sure that client programs get all messages · 6c5a7090
      Michael Hanselmann authored
      This is a large patch, but I can't figure out how to split it without
      breaking stuff. The old way of getting messages by always getting the
      last one didn't bring all messages to the client if they were added
      too fast, thereby making commands like “gnt-cluster verify” less than
      useful. These changes now introduce some sort a serial number per
      log entry to keep track what message a client already received. They
      also remove the log lock per opcode to make reading log entries thread
      safe.
      
      Reviewed-by: ultrotter
      6c5a7090
  2. 11 Aug, 2008 2 commits
  3. 08 Aug, 2008 3 commits
  4. 06 Aug, 2008 3 commits
  5. 05 Aug, 2008 1 commit
  6. 04 Aug, 2008 1 commit
  7. 31 Jul, 2008 2 commits
  8. 30 Jul, 2008 2 commits
    • Iustin Pop's avatar
      Fix pylint-detected issues · 38206f3c
      Iustin Pop authored
      This is mostly:
        - whitespace fix (space at EOL in some files, not all, broken
          indentation, etc)
        - variable names overriding others (one is a real bug in there)
        - too-long-lines
        - cleanup of most unused imports (not all)
      
      Reviewed-by: ultrotter
      38206f3c
    • Michael Hanselmann's avatar
      Rewrite job queue · 85f03e0d
      Michael Hanselmann authored
      We found several issues in the old job queue implementation. It had race
      conditions, deadlocks and other deficiencies.
      
      Short summary:
      - _QueuedOpCode and _QueuedJob are now more or less data structures with a few
        utility functions. __Setup is gone.
      - DiskJobStorage and JobQueue classes merged into one to reduce code complexity.
      - One lock in JobQueue for almost everything. There's also a lock per opcode
        for log messages.
      
      Reviewed-by: iustinp
      85f03e0d
  9. 29 Jul, 2008 1 commit
  10. 28 Jul, 2008 2 commits
  11. 25 Jul, 2008 1 commit
  12. 24 Jul, 2008 2 commits
  13. 23 Jul, 2008 6 commits
    • Michael Hanselmann's avatar
      Move code formatting job ID into a base class · ce594241
      Michael Hanselmann authored
      A later patch will add a memory based job storage class, hence this
      code is going into a separate class. It also changes the number format
      to always use at least 10 digits, allowing up to 9'999'999'999 jobs to
      be sorted without using a custom function.
      
      Reviewed-by: iustinp
      ce594241
    • Michael Hanselmann's avatar
      Rename JobStorage to DiskJobStorage · 21cc1fbd
      Michael Hanselmann authored
      Reviewed-by: iustinp
      21cc1fbd
    • Michael Hanselmann's avatar
      Fix logging with string job IDs · 205d71fd
      Michael Hanselmann authored
      The job ID is now a string, hence logging must use %s instead of %d.
      
      Reviewed-by: iustinp
      205d71fd
    • Michael Hanselmann's avatar
      Make job ID a string · 3be9a705
      Michael Hanselmann authored
      The docstring says that _NewSerialUnlocked returns “a string
      representing the job identifier”. Until now it returned an
      integer and this patch changes it.
      
      Reviewed-by: iustinp
      3be9a705
    • Iustin Pop's avatar
      Distribute the queue serial file after each update · c3f0a12f
      Iustin Pop authored
      This patch adds distribution of the queue serial file after each write
      to it (but before a new job is created and written with that ID, and
      before a response is returned, so we should be safe from crashes in
      between).
      
      Currently it only logs if a node cannot be contacted, it should abort if
      > 50% errors are seen.
      
      Reviewed-by: imsnah
      c3f0a12f
    • Iustin Pop's avatar
      Make the job storage init reuse a serial file · c4beba1c
      Iustin Pop authored
      This will be needed for master failover. If we don't have a valid queue
      directory, we need to reinitialize it, but we should keep the existing
      serial number.
      
      As such, we abstract the reading of the serial and if we find a valid
      serial, we do not reset it.
      
      Reviewed-by: imsnah
      c4beba1c
  14. 22 Jul, 2008 1 commit
    • Michael Hanselmann's avatar
      Make argument to CleanCacheUnlocked mandatory · 57f8615f
      Michael Hanselmann authored
      Not passing the argument means it has the value None. Iterating None
      doesn't work:
        >>> "123" in None
        Traceback (most recent call last):
          File "<stdin>", line 1, in ?
        TypeError: iterable argument required
      
      Hence I rename it to "exclude" instead of "exceptions", which may be
      confusing, and make it mandatory. If one wants to clean all cache
      entries, an empty list can be passed.
      
      Reviewed-by: iustinp
      57f8615f
  15. 17 Jul, 2008 1 commit
  16. 15 Jul, 2008 1 commit
  17. 14 Jul, 2008 4 commits
    • Iustin Pop's avatar
      First version of user feedback fixes · f1048938
      Iustin Pop authored
      This patch contains a raw version for fixing feedback_fn.
      
      The new mechanism works as follows:
        - instead of a per-Processor feedback_fn, there's one for each
          ExecOpCode, so that feedback for different opcodes go via possibly
          different functions
        - each _QueuedOpCode gets a message buffer, a method for adding
          feedback and a method for retrieving (parts of) the feedback
        - the _QueuedJob object gets a new attribute that is equal to the
          index of the currently executing opcode
        - job queries get an extra parameter called 'ticker' that will return
          the latest message on the current executing opcode
        - the cli.py job completion poll will show the new status if different
          from the old one
      
      Of course, quick messages will be lost, as currently only the latest one
      is available. Also changes between opcodes are not represented at all.
      
      Reviewed-by: imsnah
      f1048938
    • Iustin Pop's avatar
      Cache some jobs in memory · ac0930b9
      Iustin Pop authored
      This patch adds a caching mechanisms to the JobStorage. Note that is
      does not make the memory cache authoritative.
      
      The algorithm is:
        - all jobs loaded from disks are entered in the cache
        - all new jobs are entered in the cache
        - at each job save (in UpdateJobUnlocked), jobs which are not
          executing or queued are removed from the cache
      
      The end effect is that running jobs will always be in the cache (which
      will fix the opcode log changes) and finished jobs will be kept for a
      while in the cache after being loaded.
      
      Reviewed-by: imsnah
      ac0930b9
    • Iustin Pop's avatar
      Fix JobStorage._GetJobIDsUnlocked · 8a70e415
      Iustin Pop authored
      The job ID returned must be an integer (and the regex enforces that),
      but we didn't convert it manually.
      
      Reviewed-by: imsnah
      8a70e415
    • Iustin Pop's avatar
      Change JobStorage to work with ids not filenames · 911a495b
      Iustin Pop authored
      Currently some of the functions in JobStorage work with filenames (which
      is an implementation detail and should only be used when dealing with
      the storage) and not with job IDs. We need to change this in order to
      implement a job cache.
      
      Reviewed-by: ultrotter
      911a495b
  18. 11 Jul, 2008 2 commits
    • Michael Hanselmann's avatar
      Add experimental persistency to job queue · f1da30e6
      Michael Hanselmann authored
      It's not perfect and it's not finished, but it's a start.
      
      - Serial number is read only once, but written on each update
      - Jobs are kept only on disk (caching will be implemented)
      
      Reviewed-by: iustinp
      f1da30e6
    • Michael Hanselmann's avatar
      Make "gnt-job list" work again · af30b2fd
      Michael Hanselmann authored
      "gnt-job list" was broken after my recent changes in the RPC
      between clients and the master. This patch makes it work again.
      
      Reviewed-by: iustinp
      af30b2fd
  19. 10 Jul, 2008 1 commit
    • Iustin Pop's avatar
      Switch _QueuedOpCode to have their own lock · 307149a8
      Iustin Pop authored
      Right now, the queued opcode doesn't have a lock, and instead relies on
      the parent QueuedJob's lock.
      
      This is not good for logging feedback, so it's better to have a lock for
      each queuedopcode.
      
      Reviewed-by: ultrotter
      307149a8