1. 03 Sep, 2012 1 commit
    • Iustin Pop's avatar
      Fix warnings/errors with newer pylint · 8ad0da1e
      Iustin Pop authored
      
      
      To help developing Ganeti on newer distributions, let's try to fix
      pylint warnings/errors. I'm using pylint from current Debian wheezy:
      pylint 0.25.1, astng 0.23.1, common 0.58.0, and we have 3 things that
      needs fixing.
      
      First, a really wide "except", with the silencing in the wrong
      place. I'm not sure why this doesn't have "except Exception", so let's
      add it. However, pylint still complains about "Catching too general
      exception", even though we do want to catch both system and our
      exception, so let's add a silence for W0703. It's true that we
      shouldn't catch KeyboardInterrupt and friends, but that should be
      cleaned up on the master branch.
      
      Second, pylint complains about "redefining name builtin tuple",
      because we do some pattern matching in the except blocks in
      netutils. This seems to be a false positive, but let's clean the code
      around this.
      
      And finally, type inference again goes bad, so let's silence E1103
      with its "boolean doesn't have 'get' method".
      
      After this, I can run "make lint", and by extension "make
      commit-check" on Debian Wheezy, yay! We might be able to bump our
      required pylint versions to something not ancient…
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      8ad0da1e
  2. 23 Aug, 2012 1 commit
    • Iustin Pop's avatar
      Bump pep8 version to 1.2 · 5ae4945a
      Iustin Pop authored
      
      
      Debian Wheezy will ship with this version, and it has many improved checks compared to 0.6, so let's:
      
      - bump version in the docs
      - silence some new checks that are wrong due to our indent=2 instead of 4
      - fix lots of errors in the code where the indentation was wrong by 1
        or 2 spaces
      - fix a few cases of == True, False, None and replace with 'is'
      - re-indent some cases where the code is OK, but pep8 complains
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      5ae4945a
  3. 22 Aug, 2012 1 commit
    • Constantinos Venetsanopoulos's avatar
      Fix computation of disk sizes in _ComputeDiskSize · 6a3166cb
      Constantinos Venetsanopoulos authored
      
      
      Currently, hail fails with FailDisk when trying to add an instance
      of type: 'file', 'sharedfile' and 'rbd'.
      
      This is due to a "0" or None value in the corresponding dict inside
      _ComputeDiskSize, which results in a "O" or non Int value of the
      exported 'disk_space_total' parameter. This in turn makes hail fail,
      when trying to process the value:
      
       - with "Unable to read Int" if value is None (file)
       - with FailDisk if value is 0 (sharedfile, rbd)
      
      The latter happens because the 0 value doesn't match the instance's
      IPolicy, since it is lower than the minimum disk size.
      
      The second problem still exists when using adoption with 'plain'
      and 'blockdev' template and will be addressed in another commit.
      Signed-off-by: default avatarConstantinos Venetsanopoulos <cven@grnet.gr>
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      6a3166cb
  4. 15 Aug, 2012 1 commit
    • Iustin Pop's avatar
      Add verification of RPC results in _WipeDisks · f08e5132
      Iustin Pop authored
      
      
      Due to an oversight, the pause/resume sync RPC calls in _WipeDisks
      lack the verification of the overall RPC status, and directly iterate
      over the payload. The code actually doing the wipe does verify
      correctly the results. This can result in jobs failing with a hard to
      diagnose:
      
      OpExecError ['NoneType' object is not iterable]
      
      instead of proper "RPC failed" message.
      
      This patch adds a hard check on the pause call, but for the resume
      call it just logs a warning if the RPC failed; the rationale being
      that if we can't contact the node for pausing the sync, it's likely
      wiping will fail too, but after the wipe has been done, we can
      continue.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      f08e5132
  5. 08 Aug, 2012 1 commit
  6. 02 Aug, 2012 1 commit
    • Iustin Pop's avatar
      Fix uses of OpPrereqError without code info · 2cfbc784
      Iustin Pop authored
      
      
      A while back, we did cleanup the code and ensured (manually) that use
      of OpPrereqError includes an errors.ECODE_* field as second
      argument. Since we cannot automate the check for this, it turns out
      that more and more such usage has crept over the years, including in
      the master code (the use on the CLI side is not as important).
      
      Note that this also uncovered a few errors in ovf.py where the errors
      messages were wrongly constructed.
      
      Still looking for a way to automate this check…
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      2cfbc784
  7. 26 Jul, 2012 2 commits
    • Iustin Pop's avatar
      Fix issue in LUClusterVerifyGroup with multi-group clusters · 350506c6
      Iustin Pop authored
      
      
      In case LUClusterVerifyGroup is run on a group which doesn't contain
      the master node, the following could happen:
      
      - master node is selected due to the explicit check
      - if the order of nodes in the 'absent_nodes' list is such that the
        master node is the first in it, then we'll select (again) the master
        node
      - passing duplicate nodes to RPC calls will break due to RPC
        internals; this should be fixed separately, but in the meantime we
        just refrain from passing such duplicates
      
      This patch should not change the semantics of the code, since it
      wasn't guaranteed even before that we find a vm_capable node.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarBernardo Dal Seno <bdalseno@google.com>
      350506c6
    • Iustin Pop's avatar
      Fix node group modification of node parameters · 4bf27dab
      Iustin Pop authored
      Commit 904b3bfe
      
       tried to fix the deletion of custom ndparams from
      group, but instead broke both modification and deletion: because we
      run ForceDictType on self.op.ndparams instead of the updated
      new_ndparams, we can neither delete nor set properly spindle_count
      (since it won't be coerced to int).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      4bf27dab
  8. 25 Jul, 2012 3 commits
  9. 19 Jul, 2012 1 commit
    • René Nussbaumer's avatar
      Fix setting ipolicy on node groups · 8b057218
      René Nussbaumer authored
      
      
      On node groups we don't have the std field. However, the InstancePolicy
      object always verifies that the std value is within a given range. As we
      fill it up with defaults if not set (as it happens to be on node groups)
      and the min value is higher than the default std value (taken from
      constants.py) we fail.
      
      We overcome this situation by simply let the function know if we want to
      verify the std value at all. If we don't want to verify std, we just set
      it to a compliant value (min_v) and continue.
      
      We also slightly adapt the error message provided, as we don't have std
      values on groups.
      Signed-off-by: default avatarRené Nussbaumer <rn@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      8b057218
  10. 13 Jul, 2012 1 commit
  11. 11 Jul, 2012 1 commit
  12. 07 Jul, 2012 1 commit
  13. 05 Jul, 2012 4 commits
    • Iustin Pop's avatar
      9Add wait_for_sync flag to OpInstanceActivateDisks · b69437c5
      Iustin Pop authored
      
      
      This can be used to ensure that after activate-disks has returned, the
      instance's storage is consistent; currently there's no programmatic
      way to do this.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      b69437c5
    • Iustin Pop's avatar
      Fix DRBD resize code · cad0723b
      Iustin Pop authored
      
      
      There are two bugs in the current resize code, affecting mostly DRBD.
      
      First, due to bugs in old DRBD versions (pre 8.0.14), the code currently
      calls `drbdsetup resize' on both the primary or secondary. However,
      this is actually wrong per current DRBD (from drbdsetup(8)):
      
           resize
             This causes DRBD to reexamine the size of the device's backing
             storage device. To actually do online growing you need to
             extend the backing storages on both devices and call the resize
             command on one of your nodes.
      
      So calling it just on the primary node should be enough. However, we
      can't simply remove the calls to the secondary nodes, since that would
      break the growth of the underlying storage (LVM) on the
      secondary. Which leads to the second existing bug: we call resize on
      each node, even before finish the growth of the underlying
      storage. This can leads to all kind of issues if DRDB is not well
      behaved.
      
      So to fix both these bugs, we have to extend the current RPC call with
      another parameter, which denotes whether to extend the actual backing
      storage or just the "logical" one (DRBD being the only one; MD would
      be another, if implemented). This allows us to do the growth in two
      steps, first the backing store on all nodes, then the logical storage
      on just the primary node.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      cad0723b
    • Iustin Pop's avatar
      Fix redistribution of files w.r.t. offline nodes · cc706abc
      Iustin Pop authored
      
      
      Currently, _RedistributeAncillaryFiles computes two lists: the list of
      online nodes (for all files redistribution), and the list of
      vm_capable nodes, for hypervisor-specific files. However, the
      vm_capable list includes offline nodes too, leading to warning
      messages:
      
        WARNING: Copy of file /etc/xen/xend-config.sxp to node node13.example.com failed: Node is marked offline
      
      We fix this by trivially intersecting the vm_capable list with the
      online one.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarBernardo Dal Seno <bdalseno@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      cc706abc
    • René Nussbaumer's avatar
      Fix cluster verify error on master-ip-setup script · 770461fe
      René Nussbaumer authored
      
      
      This error does not show up until we exceed the pool of master
      candidates and have nodes which are not master candidates.
      
      The background is that we check for master-ip-setup script on master
      candidates and expect them not to be on the other nodes. However, we
      distribute a default master-ip-script which break this assumption.
      Furthermore, there's no reason why the file should just exists on the
      master candidates.
      Signed-off-by: default avatarRené Nussbaumer <rn@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      770461fe
  14. 27 Jun, 2012 4 commits
  15. 20 Jun, 2012 1 commit
    • Iustin Pop's avatar
      Fix bug in instance net changes · 80b898f9
      Iustin Pop authored
      
      
      _PrepareNicModification returns the invalid type, which triggers an
      assert resulting in a mysterious error:
      
      Failure: command execution error:
      
      Without any explanation. We fix this by removing the return value from
      _PrepareNicModification, and instead returning the expected type
      (since it differs per create/modification) from the (existing)
      wrappers for this function. We don't need to return actual changes
      from this function as _ApplyNicMods is the function that
      computes/returns the formatted changes.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Signed-off-by: default avatarRené Nussbaumer <rn@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      80b898f9
  16. 19 Jun, 2012 1 commit
    • Guido Trotter's avatar
      Allow single-homed <-> multi-homed transitions · 79829d23
      Guido Trotter authored
      
      
      To change the cluster from single homed to multi homed or vice versa one
      must target the master node first, and pass the --force option. All
      other nodes then will work as long as they are reachable by the master.
      
      Note that this will also prevent a node to be set to single-homed if the
      master is multi-homed, which wasn't disallowed before, and warn if a
      single-homed <-> multi-homed transition happens.
      
      Also note that it's still theoretically possible to flip a cluster
      inadvertently by changing the master node this way, and then doing a
      master failover before fixing the other nodes.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      79829d23
  17. 15 Jun, 2012 1 commit
  18. 14 Jun, 2012 1 commit
  19. 08 Jun, 2012 1 commit
  20. 01 Jun, 2012 1 commit
    • Iustin Pop's avatar
      Fix a type issue and bad logic in cluster verification · e375fb61
      Iustin Pop authored
      Commit 2e04d454
      
       introduced the new offline state for the instance
      state, but being a big monolithic patch it sneaked in something that
      doesn't make sense.
      
      The checks for extra instances (either wrongly up or just unknown) are
      done purely on a name-basis, not on objects, so the types there are
      wrong. Furthermore, they have no relation to the admin state of the
      instance, so we just drop the entire if block. We keep the increment
      of the offline instance count, but move it to a different loop over
      instances.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      e375fb61
  21. 22 May, 2012 2 commits
  22. 15 May, 2012 3 commits
  23. 14 May, 2012 3 commits
  24. 11 May, 2012 2 commits
  25. 10 May, 2012 1 commit