From 2a2e2610154fa481729782dc3beed80a5b048087 Mon Sep 17 00:00:00 2001
From: Iustin Pop <iustin@google.com>
Date: Thu, 5 Jul 2012 14:44:05 +0200
Subject: [PATCH] hbal: return exit status 0 in case of early exit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This derives from an internal bug, but the story is consistent across
both internal and external usage of hbal.

Basically right now, hbal returns exit code 1 if requested to exit
early, even if all jobs are successful. This is counter-intuitive due
to two reasons:

- hbal did what it was requested (exit early), so it shouldn't return error
- there were no job failures, so there's nothing to "cleanup" or
  investigate on the Ganeti cluster, so again it shouldn't return
  error

Therefore the new behaviour is as follows:

- for cases where all jobs were successful, even if terminated early
  via SIGINT or via --limit, we exit with code 0
- for cases where jobs have failed or there were other errors in
  running hbal, the exit code is 1
- for cases were hbal is requested an immediate termination (SIGTERM),
  exit code is 2, denoting "unknown whether the Ganeti cluster is
  consistent or not"

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: RenΓ© Nussbaumer <rn@google.com>
---
 htools/Ganeti/HTools/Program/Hbal.hs |  2 +-
 man/hbal.rst                         | 12 +++++++++---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/htools/Ganeti/HTools/Program/Hbal.hs b/htools/Ganeti/HTools/Program/Hbal.hs
index c6cb62998..d9b47f03a 100644
--- a/htools/Ganeti/HTools/Program/Hbal.hs
+++ b/htools/Ganeti/HTools/Program/Hbal.hs
@@ -180,7 +180,7 @@ execWrapper master nl il cref alljss = do
     then do
       hPrintf stderr "Exiting early due to user request, %d\
                      \ jobset(s) remaining." (length alljss)::IO ()
-      return False
+      return True
     else execJobSet master nl il cref alljss
 
 -- | Execute an entire jobset.
diff --git a/man/hbal.rst b/man/hbal.rst
index 2cda02f85..0c50d995f 100644
--- a/man/hbal.rst
+++ b/man/hbal.rst
@@ -415,9 +415,15 @@ EXIT STATUS
 -----------
 
 The exit status of the command will be zero, unless for some reason the
-algorithm fatally failed (e.g. wrong node or instance data), or (in case
-of job execution) either one of the jobs has failed or the balancing was
-interrupted early.
+algorithm failed (e.g. wrong node or instance data), invalid command
+line options, or (in case of job execution) one of the jobs has failed.
+
+Once job execution via Luxi has started (``-X``), if the balancing was
+interrupted early (via *SIGINT*, or via ``--max-length``) but all jobs
+executed successfully, then the exit status is zero; a non-zero exit
+code means that the cluster state should be investigated, since a job
+failed or we couldn't compute its status and this can also point to a
+problem on the Ganeti side.
 
 BUGS
 ----
-- 
GitLab