- Oct 16, 2008
-
-
Iustin Pop authored
This adds the set/reset in the jqueue and luxi modules, and a way to query it in OpQueryConfigValues, and also the comand line interface for it: $ gnt-cluster queue info The drain flag is unset $ gnt-cluster queue drain $ gnt-cluster queue info The drain flag is set $ gnt-cluster queue undrain $ gnt-cluster queue info The drain flag is unset The choice of making the setting via luxi and not an opcode is that opcodes can't be executed when drained, but we don't query via luxi since in the future it might become a cluster property as opposed to a node one. Reviewed-by: imsnah
-
- Oct 15, 2008
-
-
Iustin Pop authored
This patch adds a generic method to identify the ganeti error given its class name, and implements this across the luxi protocol. Reviewed-by: imsnah
-
- Oct 10, 2008
-
-
Iustin Pop authored
This big patch changes the call model used in internode-rpc from standalong function calls in the rpc module to via a RpcRunner class, that holds all the methods. This can be used in the future to enable smarter processing in the RPC layer itself (some quick examples are not setting the DiskID from cmdlib code, but only once in each rpc call, etc.). There are a few RPC calls that are made outside of the LU code, and these calls are left as staticmethods, so they can be used without a class instance (which requires a ConfigWriter instance). Reviewed-by: imsnah
-
- Oct 07, 2008
-
-
Iustin Pop authored
Background: when we have multiple jobs in the queue (more than just a few), many of the jobs (up to the number of threads) will be in state 'running', although many of them could be actually blocked, waiting for some locks. This is not good, as one cannot easily see what is happening. The patch extends the opcode/job possible statuses with another one, waiting, which shows that the LU is in the acquire locks phase. The mechanism for doing so is simple, we initialize (in the job queue) the opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give control to the LU's Exec, it will call a notifier back into the _JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with the proper queue locking). Because this mechanism does not save the job, all opcodes on disk will be in status WAITLOCK and not RUNNING anymore, so we also change the load sequence to consider WAITLOCK as RUNNING. With the patch applied, creating in parallel (via burnin) five instances on a five node cluster shows that only two are executing, while three are waiting for locks. Reviewed-by: imsnah
-
- Oct 06, 2008
-
-
Iustin Pop authored
This patch adds a new luxi call that implements auto-archiving of jobs older than a certain age (or -1 for all completed jobs), and the gnt-job command that makes use of this (with 'all' for -1). Reviewed-by: imsnah
-
- Oct 01, 2008
-
-
Michael Hanselmann authored
Use simpleconfig instead of ssconf. Reviewed-by: iustinp
-
Michael Hanselmann authored
This can be used to retrieve certain cluster config values from within clients. OpDumpClusterConfig was not used anywhere, hence I'm just reusing it. The way ConfigWriter.DumpConfig returned the configuration was not thread-safe, anyway (no deepcopy). Reviewed-by: iustinp
-
- Sep 09, 2008
-
-
Iustin Pop authored
This is an initial version of the master startup checks. It's a very rudimentary change, however in normal usage (an old master was started, the rest of the cluster is functioning normally) it will succeed in preventing wrong startups. Reviewed-by: imsnah
-
- Aug 29, 2008
-
-
Iustin Pop authored
This patch alters the WaitForJobChanges luxi-RPC call to have a configurable timeout, so that the call behaves nicely with long jobs that have no update. We do this by adding a timeout parameter in the RPC call, and returning a special constant when the timeout is reached without an update. The luxi client will repeatedly call the WaitForJobChanges until it gets a real change. The timeout is hardcoded as half the RWTO value. The patch also removes an unused variable (new_state) from the WaitForJobChanges method. Reviewed-by: imsnah,ultrotter
-
- Aug 27, 2008
-
-
Michael Hanselmann authored
This is a large patch, but I can't figure out how to split it without breaking stuff. The old way of getting messages by always getting the last one didn't bring all messages to the client if they were added too fast, thereby making commands like “gnt-cluster verify” less than useful. These changes now introduce some sort a serial number per log entry to keep track what message a client already received. They also remove the log lock per opcode to make reading log entries thread safe. Reviewed-by: ultrotter
-
- Aug 18, 2008
-
-
Michael Hanselmann authored
By using this Linux-specific way we don't have to care about removing the socket file when quitting or starting (after an unclean shutdown). For a more detailed description, see the comment in the patch. Reviewed-by: schreiberal
-
- Aug 11, 2008
-
-
Michael Hanselmann authored
This way clients can react faster to status or message changes and don't have to poll anymore. Reviewed-by: ultrotter
-
- Aug 08, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
- Aug 06, 2008
-
-
Michael Hanselmann authored
The job queue maintains its own node list and must be notified when nodes are added/removed. Reviewed-by: iustinp
-
Michael Hanselmann authored
By doing this we've a central place which coordinates what needs to be done when adding or removing nodes. Another patch will add calls into the job queue. Two log messages move to config.py. When removing a node, node_leave_cluster is now called after it has been removed from the configuration and job manager. That way we're sure not to access the node again after files have been removed. Reviewed-by: iustinp
-
Michael Hanselmann authored
The job queue now maintains its own list and is updated when nodes are added or removed from the cluster. Reviewed-by: iustinp
-
Michael Hanselmann authored
The job queue must be called from cmdlib when adding or removing nodes to the cluster. Moving it to the context objects makes this possible. Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
Queries don't create jobs and are more efficient. Log messages are not yet stored anywhere. Reviewed-by: iustinp
-
- Jul 30, 2008
-
-
Iustin Pop authored
The 'old-style' info, error, debug logs do not make much sense. This patch unifies the SetupLogging and SetupDaemon functions. As a result, all the commands logs to a 'commands.log' file. The patch also changes the log setup to keep going if there's an error in setting up the file logging but we're logging to stderr. Also, burnin now logs to its own file (burnin.log). Reviewed-by: ultrotter
-
Iustin Pop authored
This (big) patch reworks the master startup/shutdown and the fixes the master failover. What does the patch do? For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master - adds start/stop of the master/rapi daemon into these functions, selectively based on the start/stop arguments - makes the master call via rpc StartMaster(start_daemons=False) to the local node so that the master IP is started - and finally changes the example init.d script to directly start and stop all three daemons, since they do the right thing (depending on master/not master role) For master failover: - moves the code from LUMasterFailover into bootstrap.MasterFailover, since we need to start/stop the master during this operation and thus it can't be executed from the master - removes the LUMasterFailover and its associated opcode Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not master' are not seen during startup on non-master nodes. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch moves the CheckMaster function from ganeti-masterd to ssconf (most logical place, it cannot go in utils since we would have recursive imports between ssconf and utils) and changes ganeti-rapi to also call this function. This is needed so that starting ganeti-rapi on a non-master node does the right thing. Reviewed-by: ultrotter
-
- Jul 29, 2008
-
-
Iustin Pop authored
Reviewed-by: imsnah
-
- Jul 24, 2008
-
-
Michael Hanselmann authored
They aren't be tuples on the client side. Reviewed-by: iustinp
-
- Jul 23, 2008
-
-
Guido Trotter authored
Reviewed-by: iustinp
-
Iustin Pop authored
This patch adds distribution of the queue serial file after each write to it (but before a new job is created and written with that ID, and before a response is returned, so we should be safe from crashes in between). Currently it only logs if a node cannot be contacted, it should abort if > 50% errors are seen. Reviewed-by: imsnah
-
- Jul 21, 2008
-
-
Michael Hanselmann authored
Reviewed-by: ultrotter
-
- Jul 14, 2008
-
-
Michael Hanselmann authored
The function to stop a worker pool is TerminateWorkers(), not Shutdown(). Reviewed-by: iustinp
-
Michael Hanselmann authored
Reusing threads instead of starting one for each request is more efficient. Reviewed-by: iustinp
-
- Jul 10, 2008
-
-
Michael Hanselmann authored
Apparently I forgot to this code when removing the rest. Reviewed-by: iustinp
-
- Jul 09, 2008
-
-
Iustin Pop authored
Currently, in debug mode, both the logfile handler and the stderr handler will log debug messages. Since the stderr is redirected to the same logfile (to catch non-logged errors), it means log entries are doubled. The patch adds an extra parameter to the logger.SetupDaemon() function that allows disabling of the stderr logging. The master and node daemon will use this to enable stderr logging only when running in foreground. Reviewed-by: imsnah
-
Iustin Pop authored
This removes (hopefully) all traces of the old locking functions and uses. Reviewed-by: imsnah
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
- Introduce abstraction class on client side - Use constants for method names - Adopt legacy function SubmitOpCode to use it Reviewed-by: iustinp
-
Michael Hanselmann authored
- Use constants for dict entries - Handle exceptions on server side - Rename client function to CallMethod to match server side naming Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
- Jul 03, 2008
-
-
Iustin Pop authored
It's better for daemons if: - they log only to one log file - the log level is included - for debug runs, the filename/line number is included This patch moves the custom formatter from the watcher to the logging module and generalizes it; then it changes the master daemon to use this function instead of the generic logging (which might be deprecated anyway in the future). Reviewed-by: imsnah
-
- Jul 02, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp, ultrotter
-
Michael Hanselmann authored
Reviewed-by: ultrotter, iustinp
-
- Jul 01, 2008
-
-
Guido Trotter authored
Make the GanetiLockManager instance of GanetiContext lowercase Reviewed-by: imsnah
-