1. 27 Oct, 2011 1 commit
  2. 06 Sep, 2011 1 commit
  3. 30 Aug, 2011 2 commits
  4. 19 Aug, 2011 1 commit
  5. 02 Aug, 2011 1 commit
    • Michael Hanselmann's avatar
      jqueue: Add short delay before detecting job changes · dfc8824a
      Michael Hanselmann authored
      
      
      By sleeping for 100ms after receiving a notification for a changed job
      file the job is given some additional time to change again. This
      significantly reduces the number of LUXI calls for WaitForJobChanges
      (depending on the job, in my tests with “gnt-cluster verify
      --debug-simulate-errors” by about 80%), and improves performance (the
      same job went from around 7 seconds to around 3.5 seconds).
      
      This method is not perfect. The algorithm could be made more complex,
      e.g. by increasing the delay on each change, etc., but for now this
      simple change provides a good improvement.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      dfc8824a
  6. 21 Jul, 2011 5 commits
  7. 20 Jul, 2011 2 commits
    • Michael Hanselmann's avatar
      jqueue: Add “writable” flag to memory objects · c0f6d0d8
      Michael Hanselmann authored
      
      
      Basically only one instance of the job, the one being processed,
      should be serialized to disk and replicated to other nodes. With
      this flag assertions can be added in various places.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c0f6d0d8
    • Michael Hanselmann's avatar
      Implement chained jobs · b95479a5
      Michael Hanselmann authored
      
      
      An overview is available in the design document for this change,
      doc/design-chained-jobs.rst.
      
      When a job enters the job processor, the current opcode's dependencies
      are evaluated. If a referenced job has not yet reached the desired
      status, the current job is registered as a dependant. The job processor
      will continue to work on other pending tasks. When a job finishes it
      notifies any pending dependants by re-adding them to the workerpool.
      
      A per-job processor lock is necessary for rare cases where the same job
      can be re-added twice.
      
      There is no way to view waiting jobs at the moment, but I plan to
      export this information to “gnt-debug locks”.
      
      A so-called dependency manager takes care of managing waiting jobs and
      keeping track of their status.
      
      Unittests are included.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b95479a5
  8. 15 Jul, 2011 1 commit
  9. 11 Jul, 2011 1 commit
    • Michael Hanselmann's avatar
      Fix off-by-one bug in job serial generation · 3c88bf36
      Michael Hanselmann authored
      Commit 009e73d0
      
       (September 2009) changed the job queue to generate
      multiple job serials at once. Ever since it would return one more than
      requested.
      
      The “serial” file in the job queue directory is defined to contain the
      “last job ID used” (design-2.0). With the change above, the serial file
      would always contain the next serial number. The first value returned by
      the generating function was the one contained in the file, so during the
      switch in 2009 one job may have been overwritten.
      
      This patch changes the code to always return the exact number of
      serials, to keep the last used serial on disk and adds an assertion.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      3c88bf36
  10. 10 Jun, 2011 1 commit
  11. 31 May, 2011 1 commit
    • Michael Hanselmann's avatar
      jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445
      Michael Hanselmann authored
      
      
      When a job was cancelled, its status would be changed and the file
      written again. Since this was a final status, the job file could be
      moved anytime for archival. If the job was still in the queue, however,
      it would be processed (not fully, just updating the “end_timestamp”
      attribute) and written again. This was bad as it could leave the same
      job in two different files.
      
      With this patch the processor is changed to return early for finished
      jobs. Cancelling a queued job will finalize it right away. Unittests are
      updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      66bd7445
  12. 10 May, 2011 1 commit
  13. 25 Mar, 2011 1 commit
  14. 23 Mar, 2011 1 commit
  15. 28 Feb, 2011 1 commit
  16. 29 Dec, 2010 1 commit
  17. 15 Dec, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694
      Michael Hanselmann authored
      
      
      Iustin Pop reported that a job's file is updated many times while it
      waits for locks held by other thread(s). After an investigation it was
      concluded that the reason was a design decision for job priorities to
      return jobs to the “queued” status if they couldn't acquire all locks.
      Changing a jobs' status or priority requires an update to permanent
      storage.
      
      In a high-level view this is what happens:
      1. Mark as waitlock
      2. Write to disk as permanent storage (jobs left in this state by a
         crashing master daemon are resumed on restart)
      3. Wait for lock (assume lock is held by another thread)
      4. Mark as queued
      5. Write to disk again
      6. Return to workerpool
      
      Another option originally discussed was to leave the job in the
      “waitlock” status. Ignoring priority changes, this is what would happen:
      1. If not in waitlock
      1.1. Assert state == queued
      1.2. Mark as waitlock
      1.3. Set start_timestamp
      1.4. Write to disk as permanent storage
      3. Wait for locks (assume lock is held by another thread)
      4. Leave in waitlock
      5. Return to workerpool
      
      Now let's assume the lock is released by the other thread:
      […]
      3. Wait for locks and get them
      4. Assert state == waitlock
      5. Set state to running
      6. Set exec_timestamp
      7. Write to disk
      
      As this change reduces the number of writes from two per lock acquire
      attempt to two per opcode and one per priority increase (as happens
      after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
      the highest priority is reached), here's the patch to implement it.
      Unittests are updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      5fd6b694
  18. 12 Oct, 2010 3 commits
  19. 07 Oct, 2010 1 commit
  20. 24 Sep, 2010 3 commits
  21. 23 Sep, 2010 3 commits
  22. 20 Sep, 2010 3 commits
  23. 16 Sep, 2010 1 commit
  24. 13 Sep, 2010 3 commits