1. 17 Oct, 2012 3 commits
      Group.hs: add 'allTags'; adjust loaders and test data for it · 6b6e335b
      Dato Simó authored
      This commit adds a Group.allTags field to store the tags of node groups,
      and teaches each loader backend in HTools to populate it (additionally, the
      IAllocator class in lib/cmdlib.py now includes tags for groups too). Test
      data is updated to include an empty set of tags for node groups in all
      affected test cases.
      Signed-off-by: default avatarDato Simó <dato@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      Instance.hs: rename 'tags' to 'exclTags', provide 'allTags' · 2f907bad
      Dato Simó authored
      The mergeData function in Loader.hs included a step to filter an instance's
      tags to include only the exclusion tags (as specified via the commandline,
      or cluster-level tags). Later on, code in Node.hs assumed Instance.tags to
      contain only tags to be used for exclusion.
      Because in the future we will need to access the full list of an instance's
      tags (and not only exclusion tags), this commits deprecates the 'tags'
      field, and introduces Instance.exclTags and Instance.allTags.
      Instance.allTags is now populated from the different backends (Text, Luxi,
      Rapi, etc.), and Instance.exclTags is only populated from Loader.mergeData,
      as was done previously. This means that loading tags from e.g. Text or Simu
      and assuming that they'll be used as exclusion tags without going through
      Loader.hs will no longer work; but this was already the case with other
      fields, and 'mergeData' or 'loadExternalData' continue to be the only entry
      points to get a consistent view of the cluster. (Additionally, there were
      no tests that made this assumption that I could find.)
      Signed-off-by: default avatarDato Simó <dato@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      htools-excl.test: add test case for exclusion tags in hbal · 0397694e
      Dato Simó authored
      In preparation for future modifications in the exclusion tags field, add a
      test that verifies that exclusion tags are being honored: in a test cluster
      with two instances of the same exclusion group in each node, hbal should
      shuffle instances around to improve the score.
      Signed-off-by: default avatarDato Simó <dato@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      Merge branch 'stable-2.6' into devel-2.6 · 99c7795a
      Iustin Pop authored
      * stable-2.6:
        Fix bug in non-mirrored instance allocation
        Fix gnt-debug iallocator
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      Fix bug in non-mirrored instance allocation · 14b5d45f
      Iustin Pop authored
      The function `allocateOnSingle' has a bug in the calculation of the
      cluster score used for deciding which of the many target nodes to use
      in placing the instance: it uses the original node list for the score
      Due to this, since the original node list is the same for all target
      nodes, it means that basically `allocateOnSingle' returns the same
      score, no matter the target node, and hence the choosing of the node
      is arbitrary, instead of being done on the basis of the algorithm.
      This has gone uncaught until reported because the unittests only test
      1 allocation at a time on an empty cluster, and do not check the
      consistency of the score. I'll send separate patches on the master
      branch for adding more checks to prevent this in the future.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      Fix warnings/errors with newer pylint · 8ad0da1e
      Iustin Pop authored
      To help developing Ganeti on newer distributions, let's try to fix
      pylint warnings/errors. I'm using pylint from current Debian wheezy:
      pylint 0.25.1, astng 0.23.1, common 0.58.0, and we have 3 things that
      needs fixing.
      First, a really wide "except", with the silencing in the wrong
      place. I'm not sure why this doesn't have "except Exception", so let's
      add it. However, pylint still complains about "Catching too general
      exception", even though we do want to catch both system and our
      exception, so let's add a silence for W0703. It's true that we
      shouldn't catch KeyboardInterrupt and friends, but that should be
      cleaned up on the master branch.
      Second, pylint complains about "redefining name builtin tuple",
      because we do some pattern matching in the except blocks in
      netutils. This seems to be a false positive, but let's clean the code
      around this.
      And finally, type inference again goes bad, so let's silence E1103
      with its "boolean doesn't have 'get' method".
      After this, I can run "make lint", and by extension "make
      commit-check" on Debian Wheezy, yay! We might be able to bump our
      required pylint versions to something not ancient…
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      Fix decorator uses which crash newer pylint · fc3f75dd
      Iustin Pop authored
      Pylint version:
        pylint 0.25.1,
        astng 0.23.1, common 0.58.0
      crashes when passing the fully-qualified decorator name with:
        File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 161, in visit_function
          if not redefined_by_decorator(node):
        File "/usr/lib/pymodules/python2.7/pylint/checkers/base.py", line 116, in redefined_by_decorator
          decorator.expr.name == node.name):
      AttributeError: 'Getattr' object has no attribute 'name'
      I found out that simply using a shortened name will 'fix' this issue,
      so let's do this to allow running newer pylint versions.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      Instance autorepair design · 68640987
      Guido Trotter authored
      This design describes a tool that will perform automatic repairs on
      instances when they are detected to be unhealthy (living on offline or
      drained nodes, at the moment). These repairs can be scheduled
      automatically or requested as a one-off by a tool or person.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarBernardo Dal Seno <bdalseno@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      Fix computation of disk sizes in _ComputeDiskSize · 6a3166cb
      Constantinos Venetsanopoulos authored
      Currently, hail fails with FailDisk when trying to add an instance
      of type: 'file', 'sharedfile' and 'rbd'.
      This is due to a "0" or None value in the corresponding dict inside
      _ComputeDiskSize, which results in a "O" or non Int value of the
      exported 'disk_space_total' parameter. This in turn makes hail fail,
      when trying to process the value:
       - with "Unable to read Int" if value is None (file)
       - with FailDisk if value is 0 (sharedfile, rbd)
      The latter happens because the 0 value doesn't match the instance's
      IPolicy, since it is lower than the minimum disk size.
      The second problem still exists when using adoption with 'plain'
      and 'blockdev' template and will be addressed in another commit.
      Signed-off-by: default avatarConstantinos Venetsanopoulos <cven@grnet.gr>
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      Add verification of RPC results in _WipeDisks · f08e5132
      Iustin Pop authored
      Due to an oversight, the pause/resume sync RPC calls in _WipeDisks
      lack the verification of the overall RPC status, and directly iterate
      over the payload. The code actually doing the wipe does verify
      correctly the results. This can result in jobs failing with a hard to
      OpExecError ['NoneType' object is not iterable]
      instead of proper "RPC failed" message.
      This patch adds a hard check on the pause call, but for the resume
      call it just logs a warning if the RPC failed; the rationale being
      that if we can't contact the node for pausing the sync, it's likely
      wiping will fail too, but after the wipe has been done, we can
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
