Skip to content
Snippets Groups Projects
  1. Jul 25, 2011
  2. Jul 21, 2011
  3. Jul 20, 2011
    • Michael Hanselmann's avatar
      jqueue: Add “writable” flag to memory objects · c0f6d0d8
      Michael Hanselmann authored
      
      Basically only one instance of the job, the one being processed,
      should be serialized to disk and replicated to other nodes. With
      this flag assertions can be added in various places.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c0f6d0d8
    • Michael Hanselmann's avatar
      Implement chained jobs · b95479a5
      Michael Hanselmann authored
      
      An overview is available in the design document for this change,
      doc/design-chained-jobs.rst.
      
      When a job enters the job processor, the current opcode's dependencies
      are evaluated. If a referenced job has not yet reached the desired
      status, the current job is registered as a dependant. The job processor
      will continue to work on other pending tasks. When a job finishes it
      notifies any pending dependants by re-adding them to the workerpool.
      
      A per-job processor lock is necessary for rare cases where the same job
      can be re-added twice.
      
      There is no way to view waiting jobs at the moment, but I plan to
      export this information to “gnt-debug locks”.
      
      A so-called dependency manager takes care of managing waiting jobs and
      keeping track of their status.
      
      Unittests are included.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b95479a5
  4. Jul 15, 2011
  5. Jul 12, 2011
  6. Jul 11, 2011
  7. Jul 05, 2011
  8. Jun 10, 2011
  9. Jun 09, 2011
  10. Jun 03, 2011
  11. May 31, 2011
    • Michael Hanselmann's avatar
      jqueue: Fix potential race condition when cancelling queued jobs · 66bd7445
      Michael Hanselmann authored
      
      When a job was cancelled, its status would be changed and the file
      written again. Since this was a final status, the job file could be
      moved anytime for archival. If the job was still in the queue, however,
      it would be processed (not fully, just updating the “end_timestamp”
      attribute) and written again. This was bad as it could leave the same
      job in two different files.
      
      With this patch the processor is changed to return early for finished
      jobs. Cancelling a queued job will finalize it right away. Unittests are
      updated.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      66bd7445
  12. May 30, 2011
    • Michael Hanselmann's avatar
      gnt-node migrate: Use LU-generated jobs · b7a1c816
      Michael Hanselmann authored
      
      Until now LUNodeMigrate used multiple tasklets to evacuate all primary
      instances on a node. In some cases it would acquire all node locks,
      which isn't good on big clusters. With upcoming improvements to the LUs
      for instance failover and migration, switching to separate jobs looks
      like a better option. This patch changes LUNodeMigrate to use
      LU-generated jobs.
      
      While working on this patch, I identified a race condition in
      LUNodeMigrate.ExpandNames. A node's instances were retrieved without a
      lock and no verification was done.
      
      For RAPI, a new feature string is added and can be used to detect
      clusters which support more parameters for node migration. The client
      is updated.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      b7a1c816
  13. May 27, 2011
  14. May 24, 2011
  15. May 20, 2011
  16. May 17, 2011
  17. May 13, 2011
    • Michael Hanselmann's avatar
      SharedLock: Implement downgrade from exclusive to shared mode · 3dbe3ddf
      Michael Hanselmann authored
      
      If a job needs to modify a resource and then wait for a result, it must
      acquire the resource lock in exclusive mode. In some cases it would be
      possible to only have a shared lock for waiting. Until now it was not
      possible to change a lock's mode once it'd been acquired. Releasing and
      re-acquiring might have been possible, but would require many more
      checks and can introduce new issues.
      
      With this patch a new method, named “downgrade”, is added to Ganeti's
      own SharedLock class. It can only be called when the lock is held in
      exclusive mode and changes it to shared. If there are any pending shared
      acquires on the same priority, they're moved to the front of the queue
      and notified (jumping ahead of exclusive acquires).
      
      In a lockset the internal lock will be downgraded if, and only if, all
      individual locks owned by the current thread are either released or
      acquired in shared mode.
      
      Unittests are provided.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      3dbe3ddf
  18. May 10, 2011
  19. May 09, 2011
  20. Apr 28, 2011
  21. Apr 21, 2011
  22. Apr 18, 2011
  23. Apr 13, 2011
  24. Apr 06, 2011
    • Iustin Pop's avatar
      Increase the lock timeouts before we block-acquire · d385a174
      Iustin Pop authored
      
      This has been observed to cause problems on real clusters via the
      following mechanism:
      
      - a long job (e.g. a replace-disks) is keeping an exclusive lock on an
        instance
      - the watcher starts and submits its query instances opcode which
        wants shared locks for all instances
      - after about an hour, the watcher job falls back to blocking acquire,
        after having acquired all other locks
      - any instance opcode that wants an exclusive lock for an instance
        cannot start until the watcher has finished, even though there's no
        actual operation on that instance
      
      In order to alleviate this problem, we simply increase the max timeout
      until lock acquires are sent back to either blocking acquire or
      priority increase. The timeout is computed such that we wait ~10 hours
      (instead of one) for this to happen, which should be within the
      maximum lifetime of a reasonable opcode on a healthy cluster. The
      timeout also means that priority increases will happen every half hour.
      
      We also increase the max wait interval to 15 seconds, otherwise we'd
      have too many retries with the increased interval.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      d385a174
    • Michael Hanselmann's avatar
      utils: Add function generating regex for DNS name globbing · bbfed756
      Michael Hanselmann authored
      
      The intent of this function is to be able to provide a globbing operator
      or query filters. One should be able to say, for example, something to
      the effect of “gnt-instance shutdown '*.site'”.
      
      Also rename a variable in MatchNameComponent.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      bbfed756
  25. Apr 05, 2011
  26. Mar 31, 2011
Loading