- Oct 01, 2008
-
-
Iustin Pop authored
The watcher didn't handle the down nodes, fix this by ignoring (in secondary node reboot checks) any node that doesn't return a boot id. Reviewed-by: imsnah
-
Iustin Pop authored
The watcher was using conflicting attributes of the instance: - it queried the admin_/oper_state, which are booleans - but it compared those to the status (which is a text field) The code was changed to query the aggregated 'status' field, as that will also return indication of node problems, and we can use this only one field for all decisions. We still ask for the admin_state field as that is needed for the activate disks check (in secondary node restart). The patch also touches the watcher in some other parts: - log exceptions nicer - convert a method to @staticmethod - remove unused imports Reviewed-by: imsnah
-
Iustin Pop authored
The watcher has one last use of ganeti commands as opposed to sending requests via luxi. The patch changes this to use the cli functions. The patch also has two other changes: - fix the docstring for OpVerifyDisks (found out while converting this) - enable stderr logging on the watcher when “-d” is passes Reviewed-by: imsnah
-
- Sep 09, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Iustin Pop authored
This is an initial version of the master startup checks. It's a very rudimentary change, however in normal usage (an old master was started, the rest of the cluster is functioning normally) it will succeed in preventing wrong startups. Reviewed-by: imsnah
-
Iustin Pop authored
We create a multi-node call so that querying all nodes for agreement will be fast. Reviewed-by: imsnah
-
Michael Hanselmann authored
This helps to prevent complete deadlocks. Reviewed-by: iustinp
-
- Sep 05, 2008
-
-
Michael Hanselmann authored
Only one process should modify the queue at the same time. Reviewed-by: iustinp
-
- Aug 29, 2008
-
-
Iustin Pop authored
This patch alters the WaitForJobChanges luxi-RPC call to have a configurable timeout, so that the call behaves nicely with long jobs that have no update. We do this by adding a timeout parameter in the RPC call, and returning a special constant when the timeout is reached without an update. The luxi client will repeatedly call the WaitForJobChanges until it gets a real change. The timeout is hardcoded as half the RWTO value. The patch also removes an unused variable (new_state) from the WaitForJobChanges method. Reviewed-by: imsnah,ultrotter
-
- Aug 27, 2008
-
-
Michael Hanselmann authored
This is a large patch, but I can't figure out how to split it without breaking stuff. The old way of getting messages by always getting the last one didn't bring all messages to the client if they were added too fast, thereby making commands like “gnt-cluster verify” less than useful. These changes now introduce some sort a serial number per log entry to keep track what message a client already received. They also remove the log lock per opcode to make reading log entries thread safe. Reviewed-by: ultrotter
-
- Aug 18, 2008
-
-
Michael Hanselmann authored
By using this Linux-specific way we don't have to care about removing the socket file when quitting or starting (after an unclean shutdown). For a more detailed description, see the comment in the patch. Reviewed-by: schreiberal
-
- Aug 11, 2008
-
-
Michael Hanselmann authored
This way clients can react faster to status or message changes and don't have to poll anymore. Reviewed-by: ultrotter
-
- Aug 08, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
This will be used to archive jobs. Reviewed-by: iustinp
-
Michael Hanselmann authored
The lock will also be needed by another function. Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
jobqueue_update: Uploads a job queue file's content to a node. The most common operation is to upload something that we already have in a string. Unlike in the upload_file function, the file is not read again when distributing changes, but content has to be passed as a string. jobqueue_purge: Removes all queue related files from a node. Reviewed-by: iustinp
-
- Aug 07, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
- Aug 06, 2008
-
-
Michael Hanselmann authored
The job queue maintains its own node list and must be notified when nodes are added/removed. Reviewed-by: iustinp
-
Michael Hanselmann authored
By doing this we've a central place which coordinates what needs to be done when adding or removing nodes. Another patch will add calls into the job queue. Two log messages move to config.py. When removing a node, node_leave_cluster is now called after it has been removed from the configuration and job manager. That way we're sure not to access the node again after files have been removed. Reviewed-by: iustinp
-
Michael Hanselmann authored
The job queue now maintains its own list and is updated when nodes are added or removed from the cluster. Reviewed-by: iustinp
-
Michael Hanselmann authored
The job queue must be called from cmdlib when adding or removing nodes to the cluster. Moving it to the context objects makes this possible. Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
Queries don't create jobs and are more efficient. Log messages are not yet stored anywhere. Reviewed-by: iustinp
-
- Jul 31, 2008
-
-
Oleksiy Mishchenko authored
Add instance tag handling, improved error logging. ...oh, yes adopt instance listing for RAPI2! Reviewed-by: iustinp
-
- Jul 30, 2008
-
-
Iustin Pop authored
The 'old-style' info, error, debug logs do not make much sense. This patch unifies the SetupLogging and SetupDaemon functions. As a result, all the commands logs to a 'commands.log' file. The patch also changes the log setup to keep going if there's an error in setting up the file logging but we're logging to stderr. Also, burnin now logs to its own file (burnin.log). Reviewed-by: ultrotter
-
Iustin Pop authored
This (big) patch reworks the master startup/shutdown and the fixes the master failover. What does the patch do? For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master - adds start/stop of the master/rapi daemon into these functions, selectively based on the start/stop arguments - makes the master call via rpc StartMaster(start_daemons=False) to the local node so that the master IP is started - and finally changes the example init.d script to directly start and stop all three daemons, since they do the right thing (depending on master/not master role) For master failover: - moves the code from LUMasterFailover into bootstrap.MasterFailover, since we need to start/stop the master during this operation and thus it can't be executed from the master - removes the LUMasterFailover and its associated opcode Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not master' are not seen during startup on non-master nodes. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch moves the CheckMaster function from ganeti-masterd to ssconf (most logical place, it cannot go in utils since we would have recursive imports between ssconf and utils) and changes ganeti-rapi to also call this function. This is needed so that starting ganeti-rapi on a non-master node does the right thing. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch adds a new, unused for now, parameter to the start and stop master operations in backend. The idea behind it is that we need to be able to control whether the IP (de)activation is coupled with daemon startup/shutdown. The callers are also modified to pass this parameter (even if unused for now). Reviewed-by: ultrotter
-
- Jul 29, 2008
-
-
Iustin Pop authored
Reviewed-by: imsnah
-
Iustin Pop authored
This is needed for controlling it cleanly with start-stop daemon. Reviewed-by: ultrotter
-
- Jul 28, 2008
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
All other daemons have their main code in themselves and not in a module. This patch does the same to ganeti-rapi by moving the code from lib/rapi/RESTHTTPServer.py to daemons/ganeti-rapi. Reviewed-by: iustinp
-
- Jul 24, 2008
-
-
Michael Hanselmann authored
They aren't be tuples on the client side. Reviewed-by: iustinp
-
- Jul 23, 2008
-
-
Guido Trotter authored
Reviewed-by: iustinp
-
Guido Trotter authored
Reviewed-by: iustinp
-
Iustin Pop authored
This patch adds distribution of the queue serial file after each write to it (but before a new job is created and written with that ID, and before a response is returned, so we should be safe from crashes in between). Currently it only logs if a node cannot be contacted, it should abort if > 50% errors are seen. Reviewed-by: imsnah
-
- Jul 21, 2008
-
-
Michael Hanselmann authored
This also fixes a TODO added by ultrotter by killing the parent process when QuitGanetiException is raised. Reviewed-by: ultrotter
-
Michael Hanselmann authored
Reviewed-by: ultrotter
-