Commits · 10799c597512abae1aa5a14f2fdbed31078b7962 · itminedu / snf-ganeti

Oct 16, 2008

Add an interface for the drain flag changes/query · 3ccafd0e

Iustin Pop authored 16 years ago

This adds the set/reset in the jqueue and luxi modules, and a way to
query it in OpQueryConfigValues, and also the comand line interface for
it:
$ gnt-cluster queue info
The drain flag is unset
$ gnt-cluster queue drain
$ gnt-cluster queue info
The drain flag is set
$ gnt-cluster queue undrain
$ gnt-cluster queue info
The drain flag is unset

The choice of making the setting via luxi and not an opcode is that
opcodes can't be executed when drained, but we don't query via luxi
since in the future it might become a cluster property as opposed to a
node one.

Reviewed-by: imsnah

3ccafd0e

Oct 15, 2008

Implement transport of ganeti errors across luxi · 6797ec29

Iustin Pop authored 16 years ago

This patch adds a generic method to identify the ganeti error given its
class name, and implements this across the luxi protocol.

Reviewed-by: imsnah

6797ec29

Oct 10, 2008

Convert rpc module to RpcRunner · 72737a7f

Iustin Pop authored 16 years ago

This big patch changes the call model used in internode-rpc from
standalong function calls in the rpc module to via a RpcRunner class,
that holds all the methods. This can be used in the future to enable
smarter processing in the RPC layer itself (some quick examples are not
setting the DiskID from cmdlib code, but only once in each rpc call,
etc.).

There are a few RPC calls that are made outside of the LU code, and
these calls are left as staticmethods, so they can be used without a
class instance (which requires a ConfigWriter instance).

Reviewed-by: imsnah

72737a7f

Oct 07, 2008

Implement job 'waiting' status · e92376d7

Iustin Pop authored 16 years ago

Background: when we have multiple jobs in the queue (more than just a
few), many of the jobs (up to the number of threads) will be in state
'running', although many of them could be actually blocked, waiting for
some locks. This is not good, as one cannot easily see what is
happening.

The patch extends the opcode/job possible statuses with another one,
waiting, which shows that the LU is in the acquire locks phase. The
mechanism for doing so is simple, we initialize (in the job queue) the
opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
control to the LU's Exec, it will call a notifier back into the
_JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
the proper queue locking). Because this mechanism does not save the job,
all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
so we also change the load sequence to consider WAITLOCK as RUNNING.

With the patch applied, creating in parallel (via burnin) five instances
on a five node cluster shows that only two are executing, while three
are waiting for locks.

Reviewed-by: imsnah

e92376d7

Oct 06, 2008

Implement job auto-archiving · 07cd723a

Iustin Pop authored 16 years ago

This patch adds a new luxi call that implements auto-archiving of jobs
older than a certain age (or -1 for all completed jobs), and the gnt-job
command that makes use of this (with 'all' for -1).

Reviewed-by: imsnah

07cd723a

Oct 01, 2008

Convert ganeti-master · a42872ff
Michael Hanselmann authored 16 years ago
```
Use simpleconfig instead of ssconf.

Reviewed-by: iustinp
```
a42872ff

Add new query to get cluster config values · ae5849b5

Michael Hanselmann authored 16 years ago

This can be used to retrieve certain cluster config values from
within clients.

OpDumpClusterConfig was not used anywhere, hence I'm just reusing
it. The way ConfigWriter.DumpConfig returned the configuration
was not thread-safe, anyway (no deepcopy).

Reviewed-by: iustinp

ae5849b5

Sep 09, 2008

Implement master startup safety check · 36205981

Iustin Pop authored 16 years ago

This is an initial version of the master startup checks. It's a very
rudimentary change, however in normal usage (an old master was started,
the rest of the cluster is functioning normally) it will succeed in
preventing wrong startups.

Reviewed-by: imsnah

36205981

Aug 29, 2008

Make WaitForJobChanges deal with long jobs · 5c735209

Iustin Pop authored 16 years ago

This patch alters the WaitForJobChanges luxi-RPC call to have a
configurable timeout, so that the call behaves nicely with long jobs
that have no update.

We do this by adding a timeout parameter in the RPC call, and returning
a special constant when the timeout is reached without an update. The
luxi client will repeatedly call the WaitForJobChanges until it gets a
real change. The timeout is hardcoded as half the RWTO value.

The patch also removes an unused variable (new_state) from the
WaitForJobChanges method.

Reviewed-by: imsnah,ultrotter

5c735209

Aug 27, 2008

Make sure that client programs get all messages · 6c5a7090

Michael Hanselmann authored 16 years ago

This is a large patch, but I can't figure out how to split it without
breaking stuff. The old way of getting messages by always getting the
last one didn't bring all messages to the client if they were added
too fast, thereby making commands like “gnt-cluster verify” less than
useful. These changes now introduce some sort a serial number per
log entry to keep track what message a client already received. They
also remove the log lock per opcode to make reading log entries thread
safe.

Reviewed-by: ultrotter

6c5a7090

Aug 18, 2008

Use Linux-specific way to name master socket · 9894ece7

Michael Hanselmann authored 16 years ago

By using this Linux-specific way we don't have to care about removing the
socket file when quitting or starting (after an unclean shutdown). For a
more detailed description, see the comment in the patch.

Reviewed-by: schreiberal

9894ece7

Aug 11, 2008

Add RPC call to wait for job changes · dfe57c22

Michael Hanselmann authored 16 years ago

This way clients can react faster to status or message changes and
don't have to poll anymore.

Reviewed-by: ultrotter

dfe57c22

Aug 08, 2008
- Add query function for exports · 32f93223
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
  32f93223
Aug 06, 2008

Notify job queue about added/removed nodes · c36176cc

Michael Hanselmann authored 16 years ago

The job queue maintains its own node list and must be notified
when nodes are added/removed.

Reviewed-by: iustinp

c36176cc

Implement {Add,Readd,Remove}Node in GanetiContext · d8470559

Michael Hanselmann authored 16 years ago

By doing this we've a central place which coordinates what needs to be
done when adding or removing nodes. Another patch will add calls into
the job queue.

Two log messages move to config.py.

When removing a node, node_leave_cluster is now called after it has
been removed from the configuration and job manager. That way we're
sure not to access the node again after files have been removed.

Reviewed-by: iustinp

d8470559

jqueue: Don't pass the list of nodes to SubmitJob anymore · 4c848b18

Michael Hanselmann authored 16 years ago

The job queue now maintains its own list and is updated when
nodes are added or removed from the cluster.

Reviewed-by: iustinp

4c848b18

masterd: Move job queue into context object · 9113300d

Michael Hanselmann authored 16 years ago

The job queue must be called from cmdlib when adding or removing
nodes to the cluster. Moving it to the context objects makes
this possible.

Reviewed-by: iustinp

9113300d

Implement query for nodes · 02f7fe54
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
02f7fe54

Implement query for instances · ee6c7b94

Michael Hanselmann authored 16 years ago

Queries don't create jobs and are more efficient. Log messages
are not yet stored anywhere.

Reviewed-by: iustinp

ee6c7b94

Jul 30, 2008

Unify SetupDaemon/SetupLogging · 59f187eb

Iustin Pop authored 16 years ago

The 'old-style' info, error, debug logs do not make much sense. This
patch unifies the SetupLogging and SetupDaemon functions. As a result,
all the commands logs to a 'commands.log' file.

The patch also changes the log setup to keep going if there's an error
in setting up the file logging but we're logging to stderr.

Also, burnin now logs to its own file (burnin.log).

Reviewed-by: ultrotter

59f187eb

Rework master startup/shutdown/failover · b1b6ea87

Iustin Pop authored 16 years ago

This (big) patch reworks the master startup/shutdown and the fixes the
master failover.

What does the patch do?

For master start/stop:
  - remove the old ganeti-master script and its associated man page
  - moves the ip start/stop directly into the backend.(Start|Stop)Master
  - adds start/stop of the master/rapi daemon into these functions,
    selectively based on the start/stop arguments
  - makes the master call via rpc StartMaster(start_daemons=False) to
    the local node so that the master IP is started
  - and finally changes the example init.d script to directly start and
    stop all three daemons, since they do the right thing (depending on
    master/not master role)

For master failover:
  - moves the code from LUMasterFailover into bootstrap.MasterFailover,
    since we need to start/stop the master during this operation and
    thus it can't be executed from the master
  - removes the LUMasterFailover and its associated opcode

Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
master' are not seen during startup on non-master nodes.

Reviewed-by: ultrotter

b1b6ea87

Implement checking for the master role in rapi · 5675cd1f

Iustin Pop authored 16 years ago

This patch moves the CheckMaster function from ganeti-masterd to ssconf
(most logical place, it cannot go in utils since we would have recursive
imports between ssconf and utils) and changes ganeti-rapi to also call
this function.

This is needed so that starting ganeti-rapi on a non-master node does
the right thing.

Reviewed-by: ultrotter

5675cd1f

Jul 29, 2008
- Use constants for the pid file stems · 99e88451
  Iustin Pop authored 16 years ago
```
Reviewed-by: imsnah
```
  99e88451
Jul 24, 2008
- Fix RPC parameters for {Cancel,Archive}Job · 3a2c7775
  Michael Hanselmann authored 16 years ago
```
They aren't be tuples on the client side.

Reviewed-by: iustinp
```
  3a2c7775
Jul 23, 2008

ganeti-masterd: write and remove pidfile · 8feda3ad
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
8feda3ad

Distribute the queue serial file after each update · c3f0a12f

Iustin Pop authored 16 years ago

This patch adds distribution of the queue serial file after each write
to it (but before a new job is created and written with that ID, and
before a response is returned, so we should be safe from crashes in
between).

Currently it only logs if a node cannot be contacted, it should abort if
> 50% errors are seen.

Reviewed-by: imsnah

c3f0a12f

Jul 21, 2008
- Use new signal handler class in master daemon · 610bc9ee
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: ultrotter
```
  610bc9ee
Jul 14, 2008

Fix previous patch using workerpool in masterd · 36088c4c
Michael Hanselmann authored 16 years ago
```
The function to stop a worker pool is TerminateWorkers(), not Shutdown().

Reviewed-by: iustinp
```
36088c4c

Use workerpool in master daemon · 23e50d39

Michael Hanselmann authored 16 years ago

Reusing threads instead of starting one for each request is more efficient.

Reviewed-by: iustinp

23e50d39

Jul 10, 2008
- Remove more old job queue code · 0ed468d3
  Michael Hanselmann authored 16 years ago
```
Apparently I forgot to this code when removing the rest.

Reviewed-by: iustinp
```
  0ed468d3
Jul 09, 2008

Fix double-logging in daemons · ff5fac04

Iustin Pop authored 16 years ago

Currently, in debug mode, both the logfile handler and the stderr
handler will log debug messages. Since the stderr is redirected to the
same logfile (to catch non-logged errors), it means log entries are
doubled.

The patch adds an extra parameter to the logger.SetupDaemon() function
that allows disabling of the stderr logging. The master and node daemon
will use this to enable stderr logging only when running in foreground.

Reviewed-by: imsnah

ff5fac04

Remove the old locking functions · d4fa5c23

Iustin Pop authored 16 years ago

This removes (hopefully) all traces of the old locking functions and
uses.

Reviewed-by: imsnah

d4fa5c23

Remove old job queue code · 2467e0d3
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
2467e0d3

Change masterd/client RPC protocol · 0bbe448c

Michael Hanselmann authored 16 years ago

- Introduce abstraction class on client side
- Use constants for method names
- Adopt legacy function SubmitOpCode to use it

Reviewed-by: iustinp

0bbe448c

Make luxi RPC more flexible · 3d8548c4

Michael Hanselmann authored 16 years ago

- Use constants for dict entries
- Handle exceptions on server side
- Rename client function to CallMethod to match server side naming

Reviewed-by: iustinp

3d8548c4

Instantiate new job queue in master daemon · 50a3fbb2
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
50a3fbb2

Jul 03, 2008

Add custom logging setup for daemons · 3b316acb

Iustin Pop authored 16 years ago

It's better for daemons if:
  - they log only to one log file
  - the log level is included
  - for debug runs, the filename/line number is included

This patch moves the custom formatter from the watcher to the logging
module and generalizes it; then it changes the master daemon to use this
function instead of the generic logging (which might be deprecated
anyway in the future).

Reviewed-by: imsnah

3b316acb

Jul 02, 2008
- ganeti-masterd: Remove unused locking code · cc2bea8b
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp, ultrotter
```
  cc2bea8b
- ganeti-masterd: Use logging module · 96cb3986
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: ultrotter, iustinp
```
  96cb3986
Jul 01, 2008

Context: s/GLM/glm/ · 984f7c32

Guido Trotter authored 16 years ago

Make the GanetiLockManager instance of GanetiContext lowercase

Reviewed-by: imsnah

984f7c32