1. 14 Oct, 2011 2 commits
  2. 07 Oct, 2011 1 commit
  3. 03 Oct, 2011 1 commit
  4. 29 Sep, 2011 5 commits
    • Iustin Pop's avatar
      Small simplification in tryAlloc · 1bf6d813
      Iustin Pop authored
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      1bf6d813
    • Iustin Pop's avatar
      Change how node pairs are generated/used · b0631f10
      Iustin Pop authored
      Currently, the node pairs used for allocation are a simple [(primary,
      secondary)] list of tuples, as this is how they were used before the
      previous patch. However, for that patch, we use them separately per
      primary node, and we have to unpack this list right after generation.
      
      Therefore it makes sense to directly generate the list in the correct
      form, and remove the split from tryAlloc. This should not be slower
      than the previous patch, at least, possibly even faster.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      b0631f10
    • Iustin Pop's avatar
      Parallelise instance allocation/capacity computation · f828f4aa
      Iustin Pop authored
      This patch finally enables parallelisation in instance placement.
      
      My original try for enabling this didn't work well, but it took a
      while (and liberal use of threadscope) to understand why. The attempt
      was to simply `parMap rwhnf` over allocateOnPair, however this is not
      good as for a 100-node cluster, this will create roughly 100*100
      sparks, which is way too much: each individual spark is too small, and
      there are too many sparks. Furthermore, the combining of the
      allocateOnPair results was done single-threaded, losing even more
      parallelism. So we had O(n²) sparks to run in parallel, each spark of
      size O(1), and we combine single-threadedly a list of O(n²) length.
      
      The new algorithm does a two-stage process: we group the list of valid
      pairs per primary node, relying on the fact that usually the secondary
      nodes are somewhat balanced (it's definitely true for 'blank' cluster
      computations). We then run in parallel over all primary nodes, doing
      both the individual allocateOnPair calls *and* the concatAllocs
      summarisation. This leaves only the summing of the primary group
      results together for the main execution thread. The new numbers are:
      O(n) sparks, each of size O(n), and we combine single-threadedly a
      list of O(n) length.
      
      This translates directly into a reasonable speedup (relative numbers
      for allocation of 3 instances on a 120-node cluster):
      
      - original code (non-threaded): 1.00 (baseline)
      - first attempt (2 threads):    0.81 (20% slowdown)
      - new code (non-threaded):      1.00 (no slowdown)
      - new code (threaded/1 thread): 1.00
      - new code (2 threads):         1.65 (65% faster)
      
      We don't get a 2x speedup, because the GC time increases. Fortunately
      the code should scale well to more cores, so on many-core machines we
      should get a nice overall speedup. On a different machine with 4
      cores, we get 3.29x.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      f828f4aa
    • Iustin Pop's avatar
      Abstract comparison of AllocElements · d7339c99
      Iustin Pop authored
      This is moved outside of the concatAllocs as it will be needed in
      another place in the future.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      d7339c99
    • Iustin Pop's avatar
      Change type of Cluster.AllocSolution · 129734d3
      Iustin Pop authored
      Originally, this data type was used both by instance allocation (1
      result), and by instance relocation (many results, one per
      instance). As such, the field 'asSolutions' was a list, and the
      various code paths checked whether the length of the list matches the
      current mode. This is very ugly, as we can't guarantee this matching
      via the type system; hence the FIXME in the code.
      
      However, commit 6804faa0 removed the instance evacuation code, and thus
      we now always use just one allocation solution. Hence we can change
      the data type to a simply Maybe type, and get rid of many 'otherwise
      barf out' conditions.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarAgata Murawska <agatamurawska@google.com>
      129734d3
  5. 14 Sep, 2011 2 commits
  6. 22 Jul, 2011 3 commits
    • Iustin Pop's avatar
      htools: rework the algorithm for ChangeAll mode · 97da6b71
      Iustin Pop authored
      I think I've identified the problem with the current ChangeAll
      mode. The current algorithm works as follows:
      
      - identify a new primary by choosing the node which gives best score
        as new secondary
      - failover to it
      - identify a new secondary by choosing the node which gives best score
        as new secondary
      
      This means that the future primary is 'fixed' after the first
      iteration, leaving to possibly suboptimal results. This patch changes
      the algorithm to do what, in hindsight, seems the obvious thing to do:
      - generate all pairs (primary, secondary)
      - identify the pair that after the above sequence (r:np, f, r:ns)
        gives the best group score
      
      This fixes some of the corner cases I've seen in relocation, but not
      all; the remaining cases are related to multi-instance relocation and
      while they can't be fixed in the current framework, the needed
      rebalancing is much smaller than with the current algorithm.
      
      The patch also fixes an issue with the docstring of another function.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      97da6b71
    • Iustin Pop's avatar
      htools: replace two hardcoded uses of pri+sec nodes · 77ecfa82
      Iustin Pop authored
      These two cases use explicit uses of primary and secondary nodes with
      Instance.allNodes, which means the code is more flexible if the
      internal layout of the instance changes.
      
      I've verified that the output of involvedNodes  is not required to be
      4-element long, and as such the function docstring has been updated.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      77ecfa82
    • Iustin Pop's avatar
      htools: add target_node member to migrate opcode · bbe9758d
      Iustin Pop authored
      … and failover too. Not many changes otherwise except for
      serialisation and unittests.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      bbe9758d
  7. 21 Jul, 2011 2 commits
  8. 19 Jul, 2011 4 commits
  9. 15 Jul, 2011 11 commits
  10. 14 Jul, 2011 5 commits
  11. 13 Jul, 2011 2 commits
  12. 01 Jul, 2011 1 commit
    • Iustin Pop's avatar
      One Haskell and integer sizes fix · 7034694d
      Iustin Pop authored
      Haskell has two main integer types:
      
      - Int, which is a native-type, and is guaranteed to have at least
        [-2²⁹, 2²⁹-1] range; on 64-bit platforms, it has much higher range
      - Integer, which is a software type (implemented using libgmp), and
        thus unbounded
      
      For performance reasons, the node/instance properties use Int for
      their attributes (and Double for some, but that's another story). This
      is all fine and doesn't cause problems. However, the CStats type which
      holds the overall cluster resources starts to fail when we analyse
      clusters with more than around 400 nodes and big memory/disk sizes on
      32 bit platforms.
      
      The simple fix would be to restrict cluster sizes, but that's no
      nice. I've benchmarked and changing to Integer doesn't show a visible
      slowdown on 64-bit platforms (as far as I can read on the internets,
      GHC knows to optimise Integer and only use software types when the
      values are large enough), and it also fixes the 32-bit problem. So
      this patch changes the CStats types to Integer, except for the
      instance count (which I don't expect to overflow 2²⁹ anytime soon).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      7034694d
  13. 28 Jun, 2011 1 commit