Skip to content
Snippets Groups Projects
Commit 2a2e2610 authored by Iustin Pop's avatar Iustin Pop
Browse files

hbal: return exit status 0 in case of early exit


This derives from an internal bug, but the story is consistent across
both internal and external usage of hbal.

Basically right now, hbal returns exit code 1 if requested to exit
early, even if all jobs are successful. This is counter-intuitive due
to two reasons:

- hbal did what it was requested (exit early), so it shouldn't return error
- there were no job failures, so there's nothing to "cleanup" or
  investigate on the Ganeti cluster, so again it shouldn't return
  error

Therefore the new behaviour is as follows:

- for cases where all jobs were successful, even if terminated early
  via SIGINT or via --limit, we exit with code 0
- for cases where jobs have failed or there were other errors in
  running hbal, the exit code is 1
- for cases were hbal is requested an immediate termination (SIGTERM),
  exit code is 2, denoting "unknown whether the Ganeti cluster is
  consistent or not"

Signed-off-by: default avatarIustin Pop <iustin@google.com>
Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
parent cad0723b
No related branches found
No related tags found
No related merge requests found
......@@ -180,7 +180,7 @@ execWrapper master nl il cref alljss = do
then do
hPrintf stderr "Exiting early due to user request, %d\
\ jobset(s) remaining." (length alljss)::IO ()
return False
return True
else execJobSet master nl il cref alljss
-- | Execute an entire jobset.
......
......@@ -415,9 +415,15 @@ EXIT STATUS
-----------
The exit status of the command will be zero, unless for some reason the
algorithm fatally failed (e.g. wrong node or instance data), or (in case
of job execution) either one of the jobs has failed or the balancing was
interrupted early.
algorithm failed (e.g. wrong node or instance data), invalid command
line options, or (in case of job execution) one of the jobs has failed.
Once job execution via Luxi has started (``-X``), if the balancing was
interrupted early (via *SIGINT*, or via ``--max-length``) but all jobs
executed successfully, then the exit status is zero; a non-zero exit
code means that the cluster state should be investigated, since a job
failed or we couldn't compute its status and this can also point to a
problem on the Ganeti side.
BUGS
----
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment