From c3c7a0c123ac9a95e4292b1beee92129320fa14f Mon Sep 17 00:00:00 2001
From: Iustin Pop <iustin@google.com>
Date: Wed, 21 Jul 2010 17:47:25 +0200
Subject: [PATCH] Change the meaning of the N+1 fail metric

Currently, this metric tracks the nodes failing the N+1 check. While
this helps (in some cases) to evacuate such nodes, it's not a good
metric since rarely it will change during a step (only at the last
instance moving away). Therefore we replace it with the count of
instances living on such nodes, which is much better because:
- moving an instance away while the node is still N+1 failing will still
  reflect in the score as an optimization
- moving the last instance causing an N+1 failure will result in a heavy
  decrease of this score, thus giving the right bonus to clear this
  status
---
 Ganeti/HTools/Cluster.hs | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Ganeti/HTools/Cluster.hs b/Ganeti/HTools/Cluster.hs
index ab573645f..1e0df9ec6 100644
--- a/Ganeti/HTools/Cluster.hs
+++ b/Ganeti/HTools/Cluster.hs
@@ -233,9 +233,10 @@ compDetailedCV nl =
         mem_cv = varianceCoeff mem_l
         -- metric: disk covariance
         dsk_cv = varianceCoeff dsk_l
-        n1_l = length $ filter Node.failN1 nodes
-        -- metric: count of failN1 nodes
-        n1_score = fromIntegral n1_l::Double
+        -- metric: count of instances living on N1 failing nodes
+        n1_score = fromIntegral . sum . map (\n -> length (Node.sList n) +
+                                                   length (Node.pList n)) .
+                   filter Node.failN1 $ nodes :: Double
         res_l = map Node.pRem nodes
         -- metric: reserved memory covariance
         res_cv = varianceCoeff res_l
-- 
GitLab