From 2a2e2610154fa481729782dc3beed80a5b048087 Mon Sep 17 00:00:00 2001 From: Iustin Pop <iustin@google.com> Date: Thu, 5 Jul 2012 14:44:05 +0200 Subject: [PATCH] hbal: return exit status 0 in case of early exit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This derives from an internal bug, but the story is consistent across both internal and external usage of hbal. Basically right now, hbal returns exit code 1 if requested to exit early, even if all jobs are successful. This is counter-intuitive due to two reasons: - hbal did what it was requested (exit early), so it shouldn't return error - there were no job failures, so there's nothing to "cleanup" or investigate on the Ganeti cluster, so again it shouldn't return error Therefore the new behaviour is as follows: - for cases where all jobs were successful, even if terminated early via SIGINT or via --limit, we exit with code 0 - for cases where jobs have failed or there were other errors in running hbal, the exit code is 1 - for cases were hbal is requested an immediate termination (SIGTERM), exit code is 2, denoting "unknown whether the Ganeti cluster is consistent or not" Signed-off-by: Iustin Pop <iustin@google.com> Reviewed-by: RenΓ© Nussbaumer <rn@google.com> --- htools/Ganeti/HTools/Program/Hbal.hs | 2 +- man/hbal.rst | 12 +++++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/htools/Ganeti/HTools/Program/Hbal.hs b/htools/Ganeti/HTools/Program/Hbal.hs index c6cb62998..d9b47f03a 100644 --- a/htools/Ganeti/HTools/Program/Hbal.hs +++ b/htools/Ganeti/HTools/Program/Hbal.hs @@ -180,7 +180,7 @@ execWrapper master nl il cref alljss = do then do hPrintf stderr "Exiting early due to user request, %d\ \ jobset(s) remaining." (length alljss)::IO () - return False + return True else execJobSet master nl il cref alljss -- | Execute an entire jobset. diff --git a/man/hbal.rst b/man/hbal.rst index 2cda02f85..0c50d995f 100644 --- a/man/hbal.rst +++ b/man/hbal.rst @@ -415,9 +415,15 @@ EXIT STATUS ----------- The exit status of the command will be zero, unless for some reason the -algorithm fatally failed (e.g. wrong node or instance data), or (in case -of job execution) either one of the jobs has failed or the balancing was -interrupted early. +algorithm failed (e.g. wrong node or instance data), invalid command +line options, or (in case of job execution) one of the jobs has failed. + +Once job execution via Luxi has started (``-X``), if the balancing was +interrupted early (via *SIGINT*, or via ``--max-length``) but all jobs +executed successfully, then the exit status is zero; a non-zero exit +code means that the cluster state should be investigated, since a job +failed or we couldn't compute its status and this can also point to a +problem on the Ganeti side. BUGS ---- -- GitLab