1. 10 May, 2011 2 commits
  2. 02 May, 2011 1 commit
    • Iustin Pop's avatar
      Cluster verify: check for missing bridges · 20d317d4
      Iustin Pop authored
      
      
      Currently cluster verify doesn't check for bridge information; the
      only checks are done at instance create and failover/migrate
      time. This means a cluster that seems healthy will fail creation jobs.
      
      This patch implements a simple verification that all nodes (in the
      entire cluster, so doesn't work well for multi-group) have all the
      required bridges: the default one plus any instance bridge.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      20d317d4
  3. 28 Apr, 2011 1 commit
  4. 06 Apr, 2011 1 commit
    • Iustin Pop's avatar
      Increase the lock timeouts before we block-acquire · d385a174
      Iustin Pop authored
      
      
      This has been observed to cause problems on real clusters via the
      following mechanism:
      
      - a long job (e.g. a replace-disks) is keeping an exclusive lock on an
        instance
      - the watcher starts and submits its query instances opcode which
        wants shared locks for all instances
      - after about an hour, the watcher job falls back to blocking acquire,
        after having acquired all other locks
      - any instance opcode that wants an exclusive lock for an instance
        cannot start until the watcher has finished, even though there's no
        actual operation on that instance
      
      In order to alleviate this problem, we simply increase the max timeout
      until lock acquires are sent back to either blocking acquire or
      priority increase. The timeout is computed such that we wait ~10 hours
      (instead of one) for this to happen, which should be within the
      maximum lifetime of a reasonable opcode on a healthy cluster. The
      timeout also means that priority increases will happen every half hour.
      
      We also increase the max wait interval to 15 seconds, otherwise we'd
      have too many retries with the increased interval.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      d385a174
  5. 24 Feb, 2011 1 commit
  6. 18 Feb, 2011 1 commit
  7. 03 Feb, 2011 1 commit
    • Iustin Pop's avatar
      Bump up intra-cluster import connect timeout · 81635b5a
      Iustin Pop authored
      
      
      Currently, the export timeout is 10 times 20 seconds, but the import
      is only 30 seconds. I'm raising this to 60 seconds with two goals in
      mind:
      
      - when debugging manually, this allows for easier synchronisation of
        the processes
      - 60 equals to 3 full 20 second intervals, which I think is better
        than just one an a half
      
      This change shouldn't make a big difference either way (at most, it
      will possibly delay the job in case of failures by half a minute).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      81635b5a
  8. 28 Jan, 2011 1 commit
  9. 27 Jan, 2011 1 commit
    • Iustin Pop's avatar
      cluster verify: add hvparams verification · 58a59652
      Iustin Pop authored
      
      
      Currently, the validity of the hypervisor parameters is only checked
      at init/modification time, and not in the cluster verify. This is bad,
      as it can lead to inconsistent state that is only detected when the
      next modification (which can be unrelated) is made, leading to
      unexpected error messages.
      
      This patch adds both syntax verification (in masterd) and validity
      verification on remote nodes. The downside of the patch is that on
      clusters with many instances which have custom parameters, it will be
      slow. A possible improvement would be to detect duplicate, identical
      set of parameters, and collapse these into a single verification, but
      that is left as a TODO (in case it becomes problematic).
      
      An additional change is in utils.ForceDict, where we said 'key',
      whereas this function is always used with parameter dicts, so I
      changed it to "Unknown parameter".
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      58a59652
  10. 21 Jan, 2011 1 commit
  11. 20 Jan, 2011 3 commits
  12. 18 Jan, 2011 1 commit
  13. 14 Jan, 2011 1 commit
  14. 07 Jan, 2011 1 commit
  15. 06 Jan, 2011 2 commits
  16. 05 Jan, 2011 2 commits
  17. 21 Dec, 2010 1 commit
    • Iustin Pop's avatar
      Allow customisation of the disk index separator · 3536c792
      Iustin Pop authored
      
      
      As per issue 124, some Xen versions (or packaging) don't deal nicely
      with the colon being part of a disk name. Therefore we add a
      configure-time option for customising this.
      
      Note: setting the separator to interesting values like / is not
      handled by the code. This being a configure-time option (e.g. to be
      set by distribution packagers), we assume the person building the code
      knows what they are doing.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      3536c792
  18. 14 Dec, 2010 2 commits
  19. 13 Dec, 2010 2 commits
  20. 08 Dec, 2010 4 commits
  21. 07 Dec, 2010 1 commit
  22. 01 Dec, 2010 2 commits
  23. 29 Nov, 2010 2 commits
  24. 27 Nov, 2010 1 commit
    • Maciej Bliziński's avatar
      Adding blockdev_prefix to hypervisor options · 525011bc
      Maciej Bliziński authored
      
      
      Allows to install Red Hat based systems, for example Oracle Linux.
      Tested with OEL.
      
      The hypervisor by default offers a device named 'sda'.  If the SCSI
      module is already loaded, the disk device can't be created due to naming
      conflict, and the disk is not available.  A workaround is to modify the
      initrd by removing the scsi driver from it.  This helps, but doesn't
      allow to install the OS.
      
      Red Hat's installer, anaconda, runs parted, which tries to execute a
      check against /dev/sda and fails.  This makes anaconda think that the
      disk is faulty, and not available.  The best way to work around this, is
      to declare 'xvda' as the xen disk device.  Red Hat version of parted
      package contains a patch which makes parted skip the SCSI test if device
      name starts with 'xvd'.
      
      This patch allows to pass -H xen-pvm:blockdev_prefix="xvd" and
      successfully run the Red Hat installer.
      Signed-off-by: default avatarMaciej Bliziński <blizinski@google.com>
      [iustin@google.com: added the new parameter to XenHvm PARAMS]
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      525011bc
  25. 25 Nov, 2010 1 commit
  26. 22 Nov, 2010 1 commit
  27. 19 Nov, 2010 1 commit
  28. 10 Nov, 2010 1 commit