Skip to content
Snippets Groups Projects
  1. Apr 06, 2011
    • Iustin Pop's avatar
      Increase the lock timeouts before we block-acquire · d385a174
      Iustin Pop authored
      
      This has been observed to cause problems on real clusters via the
      following mechanism:
      
      - a long job (e.g. a replace-disks) is keeping an exclusive lock on an
        instance
      - the watcher starts and submits its query instances opcode which
        wants shared locks for all instances
      - after about an hour, the watcher job falls back to blocking acquire,
        after having acquired all other locks
      - any instance opcode that wants an exclusive lock for an instance
        cannot start until the watcher has finished, even though there's no
        actual operation on that instance
      
      In order to alleviate this problem, we simply increase the max timeout
      until lock acquires are sent back to either blocking acquire or
      priority increase. The timeout is computed such that we wait ~10 hours
      (instead of one) for this to happen, which should be within the
      maximum lifetime of a reasonable opcode on a healthy cluster. The
      timeout also means that priority increases will happen every half hour.
      
      We also increase the max wait interval to 15 seconds, otherwise we'd
      have too many retries with the increased interval.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      d385a174
  2. Feb 28, 2011
  3. Jan 10, 2011
  4. Dec 15, 2010
  5. Dec 13, 2010
  6. Dec 08, 2010
  7. Dec 07, 2010
  8. Dec 01, 2010
  9. Nov 29, 2010
  10. Nov 16, 2010
  11. Oct 12, 2010
  12. Sep 24, 2010
  13. Sep 23, 2010
  14. Sep 13, 2010
  15. Jul 15, 2010
  16. Jul 12, 2010
  17. Jun 23, 2010
  18. May 18, 2010
  19. Feb 22, 2010
  20. Jan 25, 2010
  21. Jan 13, 2010
  22. Jan 04, 2010
  23. Dec 28, 2009
  24. Nov 06, 2009
  25. Nov 02, 2009
  26. Oct 15, 2009
  27. Oct 13, 2009
  28. Oct 12, 2009
  29. Sep 17, 2009
  30. Sep 15, 2009
Loading