- Jul 26, 2010
-
-
Iustin Pop authored
Currently, the master IP activation is done in the Exec function. Since the original masterd process returns after forking, and Exec is run in the (grand)child process, this means that after 'ganeti-masterd' has returned there are still initialization tasks running. Normally this is not a problem, but in cases where one does quick master failovers, this creates a race condition which hits the QA scripts especially hard. To solve this, and make the startup process cleaner (the system is in steady state after the command has returned, even though masterd startup could still fail), we move the IP activation to Check(). This also allows error messages about the IP activation to be seen on the console. With this patch enabled, I can no longer reproduce the double-failover errors, which were occuring before in 4/5 cases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This is needed because not just the cli scripts need this decorator, but the master daemon too (and it already duplicated the code once). In cli.py we just leave a stub, so that we don't have to modify all the scripts to import rpc.py. We then change the master daemon code to reuse this decorator, instead of duplicating it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This patch implements a few changes to the instance handling. First, old instances which no longer exist on the cluster are removed from the state file, to keep things clean. Second, the instance restart counters are reset every 8 hours, since some error cases might be transient (e.g. networking issues, or machine temporarily down), and if the problem takes more than 5 restarts but is not permanent, watcher will not restart the instance. The value of 8 hours is, I think, both conservative (as not to hammer the cluster too often with restarts) and fast enough to clear semi-transient problems. And last, if an instance is not restarted due to exhausted retries, this should be warned, otherwise it's hard to understand why watcher doesn't want to restart an ERROR_down instance. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jul 16, 2010
-
-
Michael Hanselmann authored
Instead of using our custom HTTP client, using PycURL's multi interface allows us to get rid of the HTTP client threadpool. The majority of the code is still in the ganeti.http.client module. A simple per-thread HTTP client pool gives cURL a chance to cache and retain as much information as possible (e.g. SSL certs). Unused HTTP clients (e.g. due to removed nodes) are deleted after 25 requests going through the pool. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 12, 2010
-
-
Manuel Franceschini authored
This patch series basically adds a new parameter 'family' to the constructors of daemon.AsyncUDPSocket and confd.client.ConfdUDPClient. This enables the users of these two classes to support IPv6. In ganeti-confd.ConfdAsyncUDPClient a method to check the address families of all peers is added. Furthermore it adds unittests for the added functionality. Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 09, 2010
-
-
Manuel Franceschini authored
This patch moves network utility functions to a dedicated module. Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 07, 2010
-
-
Luca Bigliardi authored
Node daemon prints a lot of warnings if --no-mlock option is not specified and ctypes module is not present. With the following patch the warning is printed only at noded startup. Signed-off-by:
Luca Bigliardi <shammash@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jul 06, 2010
-
-
Luca Bigliardi authored
Signed-off-by:
Luca Bigliardi <shammash@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 02, 2010
-
-
Iustin Pop authored
This was "broken" for almost a year :) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jul 01, 2010
-
-
Michael Hanselmann authored
Currently the RAPI client uses the urllib2 and httplib modules from Python's standard library. They're used with pyOpenSSL in a very fragile way, and there are known issues when receiving large responses from a RAPI server. By switching to PycURL we leverage the power and stability of the widely-used curl library (libcurl). This brings us much more flexibility than before, and timeouts were easily implemented (something that would have involved a lot of work with the built-in modules). There's one small drawback: Programs using libcurl have to call curl_global_init(3) (available as pycurl.global_init) while exactly one thread is running (e.g. before other threads) and are supposed to call curl_global_cleanup(3) (available as pycurl.global_cleanup) upon exiting. See the manpages for details. A decorator is provided to simplify this. Unittests for the new code are provided, increasing the test coverage of the RAPI client from 74% to 89%. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 30, 2010
-
-
Manuel Franceschini authored
Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Why it's needed here but not a few lines above is a mistery that only pylint understands. Also fix an indentation error in another disable, for the same function. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 29, 2010
-
-
Guido Trotter authored
Each luxi connection now creates an asyncore MasterClientHandler (which is an AsyncTerminatedMessageStream subclass, sending each message to a client worker). This makes it harder to DOS the master daemon by just creating luxi connections, as each of them will use memory and file descriptors, but not a dedicated thread. Each connection will only handle one message at a time. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 23, 2010
-
-
Iustin Pop authored
While we only support the 'parameters' check today, the RPC call is generic enough that will be able to support other checks in the future. The backend function will both validate the parameters list (so as to make sure we don't pass in extra parameters that the OS validation doesn't care about) and the parameter values, via the OS verify script. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 14, 2010
-
-
Michael Hanselmann authored
This “magic” value will be used to ensure that we don't accidentially connect to the wrong daemon (e.g. due to a bug), comparable to DRBD's per-disk secret. Just depending on the SSL certificate isn't enough as it's always per instance and not per disk. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
The hostname and port received from the remote cluster should be validated, just in case. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Upon sending signals, ESRCH can be reported when the target no longer exists. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 11, 2010
-
-
Guido Trotter authored
This call was introduced but never used. In two years. Since it's just creating/removing a file it can also be in simpler ways, without a special rpc call, if/when we need it again. In the meantime, let's give it to history. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 10, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 08, 2010
-
-
Michael Hanselmann authored
Once we have a size for an export (in the context of the import/export daemon), we can provide the user with a percentage and ETA. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
This reports the amount of data transferred and the throughput (averaged over 60 seconds) to the master daemon. While not yet fully implemented, once the export scripts report the expected data size, we can even provide an ETA and percentage. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Jun 04, 2010
-
-
Guido Trotter authored
Sometimes a node has never been a master. Or ran rapi. In that case we need to create the file (because if later rapi gets started, it won't be able to create it itself). Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
This is a workaround until we fully switched to user separation and fixes the owners of directories/log files so ganeti-rapi will start flawlessly. This is right now run for every daemon but as it operates on a relatively small subset its impact is small. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 03, 2010
-
-
Guido Trotter authored
Not much changes with this patch. The main loop for the IOServer is repaced by mainloop.Run() and the main thread now uses asyncore to handle connections to the master socket. Once it accepts them, though, it just pushes them to the current infrastructure, and everything proceeds as before. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Tom Limoncelli authored
Update ganeti-watcher so that it tests the master's RAPI port with a simple test (in this case GetVersion). If it fails, make one attempt at restarting ganeti-rapi and retest. - daemons/ganeti-watcher: Test rapi and make one attempt at restarting it. - lib/utils.py: add StopDaemon() function. Signed-off-by:
Tom Limoncelli <tlim@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 28, 2010
-
-
Michael Hanselmann authored
The code parsing the child process' output is moved to a separate class in the impexpd module. As more programs are added, it'll become more complex and should be separated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
The import/export daemon code is already large. Moving some code to a separate module will make it smaller and easier to test. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Michael Hanselmann authored
Instead of passing around many variables for building the executed command, they're now kept as instance variables. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 25, 2010
-
-
Guido Trotter authored
This mixes AsyncNotifier with GanetiBaseAsyncoreDispatcher to provide an AsyncNotifier which will log errors, rather than bail out. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 21, 2010
-
-
Michael Hanselmann authored
The X509 key name and CA are passed from cmdlib all the way to the backend import/export daemon. With the addition of an option to choose the compression method, another parameter would have to be passed all the way. By moving these options to a separate object, adding new ones will become much easier. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
For example, exports on the same node shouldn't be compressed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
Cut&Paste, plus the following changes: - The class is renamed to SingleFileEventHandler - The monitored filename must be passed in and doesn't default to the ganeti cluster config file - A small docstring is added to the class - Pylint disables for except: and method names are added This makes it possible to write unittests for this class. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This exception is caught, but never thrown. It became useless when we moved confd from on/off to enabled/disabled, but always running on all nodes. Removing its definition and the code catching it can do no harm. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 18, 2010
-
-
Michael Hanselmann authored
Importing/exporting an instance to a remote machine creates X509 certificates which expire after some time. They need to be removed from the nodes as they become useless. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently the EOM terminator is hardcoded on the server side, and is customizable in the Transport object (with the default being the same as the value found in the server), but not in the luxi client. With this patch we move the value to constants, and remove the "fake" customizability, which would just break client/server communication. If we ever need to have a luxi transport with a different terminator it's easy enough to add it back. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Logfiles can be useful for debugging. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 17, 2010
-
-
Michael Hanselmann authored
Ganeti errors should also be logged with a backtrace. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 14, 2010
-
-
Guido Trotter authored
While mlock on noded is definitely good in most situations, there are some - namely my laptop - where it has no benefit, and uses precious non-swappable memory. To avoid this we make it optional, with a new --no-mlock option. Note that only the main node daemon and its http children are affected: the powercycle node child still uses mlock, which doesn't harm, since it's a short lived process happening just before node reboot anyway. The manpage is updated. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Luca Bigliardi <shammash@google.com>
-