Commits · f93427cdde4b201478cff147b197bb12f8221362 · itminedu / snf-ganeti

Jan 04, 2010

daemons: handle arguments correctly and uniformly · f93427cd

Iustin Pop authored 15 years ago


Of all daemons, only rapi did abort when given argument. None of our
daemons use any arguments, but they accepted them blindly. This is a
very bad experience for the user.

This patch adds checking and exiting in all daemons, in a uniform way.
One other option would have been to add a flag to GenericMain
(noargs=True).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

f93427cd

Remove more unused variables · f4ad2ef0

Iustin Pop authored 15 years ago


This removes unused variables in the rest of the code (outside lib/).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

f4ad2ef0

Add targeted pylint disables · 7260cfbe

Iustin Pop authored 15 years ago


This patch should have only:

- pylint disables
- docstring changes
- whitespace changes

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

7260cfbe

Fix use of the logging functions · 07b8a2b5

Iustin Pop authored 15 years ago


The logging functions expand the arguments themselves, thus it's safer
to let them do it rather than manual string formatting.

Also re-wraps one comment.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

07b8a2b5

Nov 25, 2009

Remove quotes from CommaJoin and convert to it · 1f864b60

Iustin Pop authored 15 years ago


This patch removes the quotes from CommaJoin and converts most of the
callers (that I could find) to it. Since CommaJoin does str(i) for i in
param, we can remove these, thus simplifying slightly a few calls.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1f864b60

Nov 05, 2009

Add new “daemon-util” script to start/stop Ganeti daemons · f154a7a3

Michael Hanselmann authored 15 years ago


Until now, Ganeti started and stopped its own daemons using custom functions.
To start, the daemon was just executed and then sent the appropriate signals to
stop it again. Init scripts would have to pay attention to the PID file and
other things.

With this patch, a new script is added (“daemon-util”, installed in
$prefix/lib/ganeti/), centralizing the starting and stopping of daemons. The
provided example init script is adjusted to use this new script. Ganeti's code
no longer calls its own init script.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

f154a7a3

Sep 18, 2009

Make ganeti-watcher use the standard debug option · 6d4e8ec0

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6d4e8ec0

Aug 26, 2009

ganeti-watcher: Don't run if paused · 3753b2cb

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3753b2cb

Jul 24, 2009

Remove <DAEMON>_PID constants · 83052f9e

Guido Trotter authored 15 years ago


The <DAEMON>_PID constants were created to reference a daemon pid file,
but actually contain a daemon's name, because the various functions that
work with pidfiles abstract the filename from the daemon name
themselves. Removing the constants and using the actual daemon name
constants in their place.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

83052f9e

May 25, 2009

watcher: automatically restart noded/rapi · c4f0219c

Iustin Pop authored 15 years ago


This patch makes the watcher automatically restart the node and rapi
daemons, if they are not running (as per the PID file).

This is not an exhaustive test; a better one would be TCP connect to the
port, and an even better one a simple protocol ping (e.g. get / for rapi
and a rpc_call_alive for noded), but since we don't know how they've
been started we can't implement it today. rapi would need to write the
SSL/port to a file, and noded something similar, so that we know how to
connect.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c4f0219c

watcher: handle full and drained queue cases · 24edc6d4

Iustin Pop authored 15 years ago


Currently the watcher is broken when the queue is full, thus not
fulfilling its job as a queue cleaner. It also doesn't handle nicely the
queue drained status.

This patch does a few changes:
  - first archive jobs, and only after submit jobs; this fixes the case
    where the queue is already full and there are jobs suited for
    archiving (but not the case where the jobs all too young to be
    archived)
  - handle nicely the job queue full and drained cases—instead of
    tracebacks, log such cases nicely
  - reverse the initial value and special cases for update_file; we now
    whitelist instead of blacklist cases, since we have much more
    blacklist cases than vice versa, and we set the flag to True only
    after the run is successful

The last change, especially, is a significant one: now errors during the
watcher run will not update the status file, and thus they won't be lost
again in the logs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

24edc6d4

May 20, 2009

watcher: write the instance status to a file · 78f44650

Iustin Pop authored 15 years ago


This patch modifies the watcher to keep on-disk a file with the instance
status; this can be used from outside of ganeti to react to instances
being down (when the watcher cannot restart them).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

78f44650

May 19, 2009

watcher: try to restart the master if down · 7dfb83c2

Iustin Pop authored 15 years ago


Bugs in either our code or in associated libraries can bring the master daemon
down, and this (due to the 2.0 architecture) stops all work on the cluster.

Since the watcher already does periodic checks on the cluster, we modify
it to try to start the master automatically in case of failures to
connect. This will be tried only once per cycle.

Also, in this case, we modify the code so that the watcher status file
is not updated - its timestamp will reflect thus the time of last
successful connection to the master.

Side note: the except errors.ConfigurationError part could be cleaned
up, since in 2.0 we don't usually get that directly, and if we do it's
an error and we shouldn't touch the file anyway; but that is not a rc5
change.

Signed-off-by: Iustin Pop <iustin@google.com>

7dfb83c2

Apr 06, 2009

Fix the output of watcher on non-master nodes · 2c404217

Iustin Pop authored 15 years ago

Currently the watcher spews errors message on non-master nodes. This
cleans it up.

Reviewed-by: imsnah

2c404217

Change the watcher to use jobs instead of queries · 6dfcc47b

Iustin Pop authored 15 years ago

As per the mailing list discussion, this patch changes the watcher to
use a single job (two opcodes) for getting the cluster state (node list
and instance list); it will then compute the needed actions based on
this data.

The patch also archives this job and the verify-disks job.

Reviewed-by: imsnah

6dfcc47b

Mar 09, 2009

watcher: fix startup sequence locking the master · cc962d58