Skip to content
Snippets Groups Projects
  1. Mar 30, 2012
  2. Dec 22, 2011
  3. Dec 21, 2011
  4. Nov 21, 2011
  5. Nov 17, 2011
  6. Oct 27, 2011
  7. Oct 26, 2011
    • Michael Hanselmann's avatar
      Convert job queue's RPC to generated code · fb1ffbca
      Michael Hanselmann authored
      
      With these changes job queue RPC will finally show up on the lock
      monitor. See below for an example. A job queue-specific class is used to
      restrict the use of a static list for name resolution to the job queue.
      Further improvements can be made to not re-create the whole RPC client
      for every call (e.g. by using a more dynamic resolver), but for now this
      works.
      
      rpc/node8.example.com/jobqueue_update Jq8/Job9/TEST_DELAY
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      fb1ffbca
  8. Sep 06, 2011
  9. Aug 30, 2011
  10. Aug 19, 2011
  11. Aug 02, 2011
    • Michael Hanselmann's avatar
      jqueue: Add short delay before detecting job changes · dfc8824a
      Michael Hanselmann authored
      
      By sleeping for 100ms after receiving a notification for a changed job
      file the job is given some additional time to change again. This
      significantly reduces the number of LUXI calls for WaitForJobChanges
      (depending on the job, in my tests with “gnt-cluster verify
      --debug-simulate-errors” by about 80%), and improves performance (the
      same job went from around 7 seconds to around 3.5 seconds).
      
      This method is not perfect. The algorithm could be made more complex,
      e.g. by increasing the delay on each change, etc., but for now this
      simple change provides a good improvement.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      dfc8824a
  12. Jul 21, 2011
  13. Jul 20, 2011
    • Michael Hanselmann's avatar
      jqueue: Add “writable” flag to memory objects · c0f6d0d8
      Michael Hanselmann authored
      
      Basically only one instance of the job, the one being processed,
      should be serialized to disk and replicated to other nodes. With
      this flag assertions can be added in various places.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c0f6d0d8
    • Michael Hanselmann's avatar
      Implement chained jobs · b95479a5
      Michael Hanselmann authored
      
      An overview is available in the design document for this change,
      doc/design-chained-jobs.rst.
      
      When a job enters the job processor, the current opcode's dependencies
      are evaluated. If a referenced job has not yet reached the desired
      status, the current job is registered as a dependant. The job processor
      will continue to work on other pending tasks. When a job finishes it
      notifies any pending dependants by re-adding them to the workerpool.
      
      A per-job processor lock is necessary for rare cases where the same job
      can be re-added twice.
      
      There is no way to view waiting jobs at the moment, but I plan to
      export this information to “gnt-debug locks”.
      
      A so-called dependency manager takes care of managing waiting jobs and
      keeping track of their status.
      
      Unittests are included.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b95479a5
  14. Jul 15, 2011
  15. Jul 11, 2011
    • Michael Hanselmann's avatar
      Fix off-by-one bug in job serial generation · 3c88bf36
      Michael Hanselmann authored
      
      Commit 009e73d0 (September 2009) changed the job queue to generate
      multiple job serials at once. Ever since it would return one more than
      requested.
      
      The “serial” file in the job queue directory is defined to contain the
      “last job ID used” (design-2.0). With the change above, the serial file
      would always contain the next serial number. The first value returned by
      the generating function was the one contained in the file, so during the
      switch in 2009 one job may have been overwritten.
      
      This patch changes the code to always return the exact number of
      serials, to keep the last used serial on disk and adds an assertion.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      3c88bf36
  16. Jun 10, 2011
  17. May 31, 2011
    • Michael Hanselmann's avatar
      jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445
      Michael Hanselmann authored
      
      When a job was cancelled, its status would be changed and the file
      written again. Since this was a final status, the job file could be
      moved anytime for archival. If the job was still in the queue, however,
      it would be processed (not fully, just updating the “end_timestamp”
      attribute) and written again. This was bad as it could leave the same
      job in two different files.
      
      With this patch the processor is changed to return early for finished
      jobs. Cancelling a queued job will finalize it right away. Unittests are
      updated.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      66bd7445
  18. May 10, 2011
  19. Mar 25, 2011
  20. Mar 23, 2011
  21. Feb 28, 2011
  22. Dec 29, 2010
  23. Dec 15, 2010
    • Michael Hanselmann's avatar
      jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694
      Michael Hanselmann authored
      
      Iustin Pop reported that a job's file is updated many times while it
      waits for locks held by other thread(s). After an investigation it was
      concluded that the reason was a design decision for job priorities to
      return jobs to the “queued” status if they couldn't acquire all locks.
      Changing a jobs' status or priority requires an update to permanent
      storage.
      
      In a high-level view this is what happens:
      1. Mark as waitlock
      2. Write to disk as permanent storage (jobs left in this state by a
         crashing master daemon are resumed on restart)
      3. Wait for lock (assume lock is held by another thread)
      4. Mark as queued
      5. Write to disk again
      6. Return to workerpool
      
      Another option originally discussed was to leave the job in the
      “waitlock” status. Ignoring priority changes, this is what would happen:
      1. If not in waitlock
      1.1. Assert state == queued
      1.2. Mark as waitlock
      1.3. Set start_timestamp
      1.4. Write to disk as permanent storage
      3. Wait for locks (assume lock is held by another thread)
      4. Leave in waitlock
      5. Return to workerpool
      
      Now let's assume the lock is released by the other thread:
      […]
      3. Wait for locks and get them
      4. Assert state == waitlock
      5. Set state to running
      6. Set exec_timestamp
      7. Write to disk
      
      As this change reduces the number of writes from two per lock acquire
      attempt to two per opcode and one per priority increase (as happens
      after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
      the highest priority is reached), here's the patch to implement it.
      Unittests are updated.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      5fd6b694
  24. Oct 12, 2010
  25. Oct 07, 2010
  26. Sep 24, 2010
Loading