1. 21 Jul, 2011 2 commits
  2. 20 Jul, 2011 2 commits
    • Michael Hanselmann's avatar
      jqueue: Add “writable” flag to memory objects · c0f6d0d8
      Michael Hanselmann authored
      
      
      Basically only one instance of the job, the one being processed,
      should be serialized to disk and replicated to other nodes. With
      this flag assertions can be added in various places.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c0f6d0d8
    • Michael Hanselmann's avatar
      Implement chained jobs · b95479a5
      Michael Hanselmann authored
      
      
      An overview is available in the design document for this change,
      doc/design-chained-jobs.rst.
      
      When a job enters the job processor, the current opcode's dependencies
      are evaluated. If a referenced job has not yet reached the desired
      status, the current job is registered as a dependant. The job processor
      will continue to work on other pending tasks. When a job finishes it
      notifies any pending dependants by re-adding them to the workerpool.
      
      A per-job processor lock is necessary for rare cases where the same job
      can be re-added twice.
      
      There is no way to view waiting jobs at the moment, but I plan to
      export this information to “gnt-debug locks”.
      
      A so-called dependency manager takes care of managing waiting jobs and
      keeping track of their status.
      
      Unittests are included.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b95479a5
  3. 15 Jul, 2011 1 commit
  4. 11 Jul, 2011 1 commit
    • Michael Hanselmann's avatar
      Fix off-by-one bug in job serial generation · 3c88bf36
      Michael Hanselmann authored
      Commit 009e73d0
      
       (September 2009) changed the job queue to generate
      multiple job serials at once. Ever since it would return one more than
      requested.
      
      The “serial” file in the job queue directory is defined to contain the
      “last job ID used” (design-2.0). With the change above, the serial file
      would always contain the next serial number. The first value returned by
      the generating function was the one contained in the file, so during the
      switch in 2009 one job may have been overwritten.
      
      This patch changes the code to always return the exact number of
      serials, to keep the last used serial on disk and adds an assertion.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      3c88bf36
  5. 10 Jun, 2011 1 commit
  6. 31 May, 2011 1 commit
    • Michael Hanselmann's avatar
      jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445
      Michael Hanselmann authored
      
      
      When a job was cancelled, its status would be changed and the file
      written again. Since this was a final status, the job file could be
      moved anytime for archival. If the job was still in the queue, however,
      it would be processed (not fully, just updating the “end_timestamp”
      attribute) and written again. This was bad as it could leave the same
      job in two different files.
      
      With this patch the processor is changed to return early for finished
      jobs. Cancelling a queued job will finalize it right away. Unittests are
      updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      66bd7445
  7. 10 May, 2011 1 commit
  8. 25 Mar, 2011 1 commit
  9. 23 Mar, 2011 1 commit
  10. 28 Feb, 2011 1 commit
  11. 29 Dec, 2010 1 commit
  12. 15 Dec, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694
      Michael Hanselmann authored
      
      
      Iustin Pop reported that a job's file is updated many times while it
      waits for locks held by other thread(s). After an investigation it was
      concluded that the reason was a design decision for job priorities to
      return jobs to the “queued” status if they couldn't acquire all locks.
      Changing a jobs' status or priority requires an update to permanent
      storage.
      
      In a high-level view this is what happens:
      1. Mark as waitlock
      2. Write to disk as permanent storage (jobs left in this state by a
         crashing master daemon are resumed on restart)
      3. Wait for lock (assume lock is held by another thread)
      4. Mark as queued
      5. Write to disk again
      6. Return to workerpool
      
      Another option originally discussed was to leave the job in the
      “waitlock” status. Ignoring priority changes, this is what would happen:
      1. If not in waitlock
      1.1. Assert state == queued
      1.2. Mark as waitlock
      1.3. Set start_timestamp
      1.4. Write to disk as permanent storage
      3. Wait for locks (assume lock is held by another thread)
      4. Leave in waitlock
      5. Return to workerpool
      
      Now let's assume the lock is released by the other thread:
      […]
      3. Wait for locks and get them
      4. Assert state == waitlock
      5. Set state to running
      6. Set exec_timestamp
      7. Write to disk
      
      As this change reduces the number of writes from two per lock acquire
      attempt to two per opcode and one per priority increase (as happens
      after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
      the highest priority is reached), here's the patch to implement it.
      Unittests are updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      5fd6b694
  13. 12 Oct, 2010 3 commits
  14. 07 Oct, 2010 1 commit
  15. 24 Sep, 2010 3 commits
  16. 23 Sep, 2010 3 commits
  17. 20 Sep, 2010 3 commits
  18. 16 Sep, 2010 1 commit
  19. 13 Sep, 2010 4 commits
  20. 10 Sep, 2010 3 commits
  21. 07 Sep, 2010 2 commits
  22. 24 Aug, 2010 1 commit
  23. 19 Aug, 2010 1 commit
    • Michael Hanselmann's avatar
      jqueue: Remove lock status field · 9bdab621
      Michael Hanselmann authored
      
      
      With the job queue changes for Ganeti 2.2, watched and queried jobs are
      loaded directly from disk, rendering the in-memory “lock_status” field
      useless. Writing it to disk would be possible, but has a huge cost at
      runtime (when tested, processing 1'000 opcodes involved 4'000 additional
      writes to job files, even with replication turned off).
      
      Using an additional in-memory dictionary to just manage this field turned
      out to be a complicated task due to the necessary locking.
      
      The plan is to introduce a more generic lock debugging mechanism in the
      near future. Hence the decision is to remove this field now instead of
      spending a lot of time to make it working again.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      9bdab621
  24. 18 Aug, 2010 1 commit