1. 22 Dec, 2011 2 commits
  2. 21 Dec, 2011 1 commit
    • Michael Hanselmann's avatar
      jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e
      Michael Hanselmann authored
      
      
      When an opcode is about to be processed its dependencies are
      evaluated using “_JobDependencyManager.CheckAndRegister”. Due
      to its nature that function requires a lock on the manager's
      internal structures. All of this happens while the job queue
      lock is held in shared mode (required for the job processor).
      
      When a job has been processed any pending dependencies are re-added
      to the job workerpool. Before this patch that would require
      the manager's lock and then, for adding the jobs, the job queue
      lock. Since this is in reverse order it will lead to deadlocks.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      37d76f1e
  3. 21 Jul, 2011 3 commits
  4. 20 Jul, 2011 2 commits
    • Michael Hanselmann's avatar
      jqueue: Add “writable” flag to memory objects · c0f6d0d8
      Michael Hanselmann authored
      
      
      Basically only one instance of the job, the one being processed,
      should be serialized to disk and replicated to other nodes. With
      this flag assertions can be added in various places.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c0f6d0d8
    • Michael Hanselmann's avatar
      Implement chained jobs · b95479a5
      Michael Hanselmann authored
      
      
      An overview is available in the design document for this change,
      doc/design-chained-jobs.rst.
      
      When a job enters the job processor, the current opcode's dependencies
      are evaluated. If a referenced job has not yet reached the desired
      status, the current job is registered as a dependant. The job processor
      will continue to work on other pending tasks. When a job finishes it
      notifies any pending dependants by re-adding them to the workerpool.
      
      A per-job processor lock is necessary for rare cases where the same job
      can be re-added twice.
      
      There is no way to view waiting jobs at the moment, but I plan to
      export this information to “gnt-debug locks”.
      
      A so-called dependency manager takes care of managing waiting jobs and
      keeping track of their status.
      
      Unittests are included.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b95479a5
  5. 03 Jun, 2011 1 commit
  6. 31 May, 2011 1 commit
    • Michael Hanselmann's avatar
      jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445
      Michael Hanselmann authored
      
      
      When a job was cancelled, its status would be changed and the file
      written again. Since this was a final status, the job file could be
      moved anytime for archival. If the job was still in the queue, however,
      it would be processed (not fully, just updating the “end_timestamp”
      attribute) and written again. This was bad as it could leave the same
      job in two different files.
      
      With this patch the processor is changed to return early for finished
      jobs. Cancelling a queued job will finalize it right away. Unittests are
      updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      66bd7445
  7. 25 Mar, 2011 1 commit
  8. 18 Jan, 2011 1 commit
  9. 29 Dec, 2010 1 commit
  10. 15 Dec, 2010 2 commits
    • Michael Hanselmann's avatar
      jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694
      Michael Hanselmann authored
      
      
      Iustin Pop reported that a job's file is updated many times while it
      waits for locks held by other thread(s). After an investigation it was
      concluded that the reason was a design decision for job priorities to
      return jobs to the “queued” status if they couldn't acquire all locks.
      Changing a jobs' status or priority requires an update to permanent
      storage.
      
      In a high-level view this is what happens:
      1. Mark as waitlock
      2. Write to disk as permanent storage (jobs left in this state by a
         crashing master daemon are resumed on restart)
      3. Wait for lock (assume lock is held by another thread)
      4. Mark as queued
      5. Write to disk again
      6. Return to workerpool
      
      Another option originally discussed was to leave the job in the
      “waitlock” status. Ignoring priority changes, this is what would happen:
      1. If not in waitlock
      1.1. Assert state == queued
      1.2. Mark as waitlock
      1.3. Set start_timestamp
      1.4. Write to disk as permanent storage
      3. Wait for locks (assume lock is held by another thread)
      4. Leave in waitlock
      5. Return to workerpool
      
      Now let's assume the lock is released by the other thread:
      […]
      3. Wait for locks and get them
      4. Assert state == waitlock
      5. Set state to running
      6. Set exec_timestamp
      7. Write to disk
      
      As this change reduces the number of writes from two per lock acquire
      attempt to two per opcode and one per priority increase (as happens
      after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
      the highest priority is reached), here's the patch to implement it.
      Unittests are updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      5fd6b694
    • Michael Hanselmann's avatar
      Improve jqueue unittests · ebb2a2a3
      Michael Hanselmann authored
      
      
      - Verify job file updates
      - Ensure queue lock is released while executing opcode
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      ebb2a2a3
  11. 12 Oct, 2010 1 commit
  12. 24 Sep, 2010 2 commits
  13. 23 Sep, 2010 1 commit
  14. 22 Sep, 2010 1 commit
  15. 20 Sep, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Change model from per-job to per-opcode processing · be760ba8
      Michael Hanselmann authored
      
      
      In order to support priorities, the processing of jobs needs to be
      changed. Instead of processing jobs as a whole, the code is changed to
      process one opcode at a time and then return to the queue. See the
      Ganeti 2.3 design document for details.
      
      This patch does not yet use priorities for acquiring locks.
      
      The enclosed unittests increase the test coverage of jqueue.py from
      about 34% to 58%. Please note that they also test some parts not added
      by this patch, but testing them became only possible with some
      infrastructure added by this patch. For the first time, many
      implications and assumptions for the job queue are codified in these
      unittests.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      be760ba8
  16. 16 Sep, 2010 1 commit
  17. 13 Sep, 2010 1 commit
  18. 07 Sep, 2010 1 commit
  19. 15 Jul, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Factorize code waiting for job changes · 989a8bee
      Michael Hanselmann authored
      
      
      By splitting the _WaitForJobChangesHelper class into multiple smaller
      classes, we gain in several places:
      
      - Simpler code, less interaction between functions and variables
      - Easy to unittest (close to 100% coverage)
      - Waiting for job changes has no direct knowledge of queue anymore (it
        doesn't references queue functions anymore, especially not private ones)
      - Activate inotify only if there was no change at the beginning (and
        checking again right away to avoid race conditions)
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      989a8bee