- 30 Jul, 2008 4 commits
-
-
Iustin Pop authored
This (big) patch reworks the master startup/shutdown and the fixes the master failover. What does the patch do? For master start/stop: - remove the old ganeti-master script and its associated man page - moves the ip start/stop directly into the backend.(Start|Stop)Master - adds start/stop of the master/rapi daemon into these functions, selectively based on the start/stop arguments - makes the master call via rpc StartMaster(start_daemons=False) to the local node so that the master IP is started - and finally changes the example init.d script to directly start and stop all three daemons, since they do the right thing (depending on master/not master role) For master failover: - moves the code from LUMasterFailover into bootstrap.MasterFailover, since we need to start/stop the master during this operation and thus it can't be executed from the master - removes the LUMasterFailover and its associated opcode Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not master' are not seen during startup on non-master nodes. Reviewed-by: ultrotter
-
Iustin Pop authored
Since we need to compute this from outside utils.py, we change this to a public function. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch moves the CheckMaster function from ganeti-masterd to ssconf (most logical place, it cannot go in utils since we would have recursive imports between ssconf and utils) and changes ganeti-rapi to also call this function. This is needed so that starting ganeti-rapi on a non-master node does the right thing. Reviewed-by: ultrotter
-
Iustin Pop authored
This patch adds a new, unused for now, parameter to the start and stop master operations in backend. The idea behind it is that we need to be able to control whether the IP (de)activation is coupled with daemon startup/shutdown. The callers are also modified to pass this parameter (even if unused for now). Reviewed-by: ultrotter
-
- 29 Jul, 2008 6 commits
-
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
The passed parameters were not correct. Reviewed-by: iustinp, ultrotter
-
Iustin Pop authored
Reviewed-by: imsnah
-
Iustin Pop authored
Reviewed-by: imsnah
-
Iustin Pop authored
We cannot depend on all environments to have a start-stop-daemon or similar tool. We instead implement a KillProcess function that behaves similar to “start-stop-daemon --retry”. Note that the attached unittest can hang in foreground if the child misbehaves (doesn't write to the internal pipe). Since unittest are either run in the foreground or are run with a timeout from an automated framework, I think this is an acceptable trade-off (against of using hardcoded timeouts in the test). Reviewed-by: imsnah
-
Iustin Pop authored
We already have a function to test if a PID is alive, so it makes more sense to use function composition that force calling (since we need to read PIDs from files in other places too). Now IsProcessAlive returns False for PIDs <= 0, since this is the error return from ReadPidFile. The patch also adds a unittest for checking that WriteFile raises the correct exception, and checks that an invalid or missing file causes ReadPidFile to return zero. The unittest tearDown method will try to cleanup the temp directory too (otherwise it leaves stuff after it). Reviewed-by: ultrotter
-
- 28 Jul, 2008 5 commits
-
-
Michael Hanselmann authored
All other daemons have their main code in themselves and not in a module. This patch does the same to ganeti-rapi by moving the code from lib/rapi/RESTHTTPServer.py to daemons/ganeti-rapi. Reviewed-by: iustinp
-
Michael Hanselmann authored
The generic HTTP server doesn't know about httperror based exceptions and would treat them as unknown exceptions, thereby not doing the right thing with HTTP errors. Reviewed-by: iustinp
-
Michael Hanselmann authored
Locking is not completeley right due to a deadlock when the job calls UpdateJob after changing its status. Reviewed-by: ultrotter
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: ultrotter
-
- 25 Jul, 2008 2 commits
-
-
Michael Hanselmann authored
It might come in handy at some point and makes the code a bit easier to read. Reviewed-by: iustinp
-
Oleksiy Mishchenko authored
The set triggers exception on a list-tags command and RAPI calls for tags since it is not serializable by JSON. Reviewed-by: iustinp
-
- 24 Jul, 2008 3 commits
-
-
Oleksiy Mishchenko authored
Reviewed-by: imsnah
-
Michael Hanselmann authored
So far no error reporting to the client is done. Clients don't get noticed if a job doesn't exist or couldn't be archived because of its current status. The internal cache is always cleaned when the preconditions didn't fail to make sure that the actual disk status will be reread next time. Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
- 23 Jul, 2008 12 commits
-
-
Michael Hanselmann authored
A later patch will add a memory based job storage class, hence this code is going into a separate class. It also changes the number format to always use at least 10 digits, allowing up to 9'999'999'999 jobs to be sorted without using a custom function. Reviewed-by: iustinp
-
Guido Trotter authored
WritePidFile is a helper function that writes the current pid in a pidfile within the ganeti run directory. RemovePidFile tries to delete it. Reviewed-by: iustinp
-
Guido Trotter authored
This helper function reads a pid from a file containing it and checks whether it refers to a live process. Reviewed-by: iustinp
-
Guido Trotter authored
An implementation mistake from the original design caused nodes to be locked before instances, rather than after. This patch inverts the level numbering, changing also the relevant unittests and the recursive locking function starting point. Reviewed-by: iustinp
-
Oleksiy Mishchenko authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
Reviewed-by: iustinp
-
Michael Hanselmann authored
The job ID is now a string, hence logging must use %s instead of %d. Reviewed-by: iustinp
-
Iustin Pop authored
We can use zip for simplifying this function. Actually, at this point I'm not sure if it needs to be a separate function at all. Reviewed-by: imsnah
-
Michael Hanselmann authored
The docstring says that _NewSerialUnlocked returns “a string representing the job identifier”. Until now it returned an integer and this patch changes it. Reviewed-by: iustinp
-
Iustin Pop authored
This patch adds distribution of the queue serial file after each write to it (but before a new job is created and written with that ID, and before a response is returned, so we should be safe from crashes in between). Currently it only logs if a node cannot be contacted, it should abort if > 50% errors are seen. Reviewed-by: imsnah
-
Iustin Pop authored
This will be needed for master failover. If we don't have a valid queue directory, we need to reinitialize it, but we should keep the existing serial number. As such, we abstract the reading of the serial and if we find a valid serial, we do not reset it. Reviewed-by: imsnah
-
Guido Trotter authored
This was a TODO for 2.0 Reviewed-by: iustinp
-
- 22 Jul, 2008 8 commits
-
-
Guido Trotter authored
Grab a lock for the instance we're working on, and update its params. Reviewed-by: iustinp
-
Guido Trotter authored
When we set the instance params we're not adding a new instance, but just updating an existing one, so why using AddInstance? Reviewed-by: iustinp
-
Guido Trotter authored
For ConnectConsole we just need to lock the instance we're connecting to. We make a few rpcs to its primary node, but node daemons can now handle multiple queries and nodes cannot be removed till they have instances on them anyway. Note that since we return the ssh command, and that's executed outside of the ganeti daemon, without any locks held, the instance can then be subject to operations while we're connected to it, but that was the previous behavior as well. Reviewed-by: iustinp
-
Guido Trotter authored
LUs that take an instance name as input and need to expand its name and lock it can use it to simplify their ExpandNames call. Possibly, and _ExpandAndLockNode will come as well. Reviewed-by: iustinp
-
Guido Trotter authored
LUQueryClusterInfo and LUDumpClusterConfig can be made concurrent and don't need to acquire any locks. In fact they don't interact with the cluster at all, but just with its configuration, which is thread-safe by design. Reviewed-by: iustinp
-
Guido Trotter authored
Two top level definitions were separated only by one empty line. Fixing this. Reviewed-by: imsnah
-
Oleksiy Mishchenko authored
Reviewed-by: imsnah
-
Michael Hanselmann authored
Not passing the argument means it has the value None. Iterating None doesn't work: >>> "123" in None Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: iterable argument required Hence I rename it to "exclude" instead of "exceptions", which may be confusing, and make it mandatory. If one wants to clean all cache entries, an empty list can be passed. Reviewed-by: iustinp
-