- Oct 13, 2010
-
-
Iustin Pop authored
Currently, masterd startup with old software versions is very confusing for users: we present two tracebacks, with a message in the middle about "version mismatch". This can lead to users believing that all that needs to be done is to fix the config file. This patch attempts to improve this by handling this case in masterd itself (not in the child), and showing a more friendly message for this case. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 07, 2010
-
-
Iustin Pop authored
This makes almost all of the daemons show error messages, and not return until they finished listening on the appropriate sockets. Masterd is the only one "special", as it doesn't do enough initialization in the server creation, only later. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, GenericMain does a two-staged workflow: - Check, before forking - then Exec, after forking This means we don't have any possibility to treat preparation work (before the daemon is ready for work) different from the actual work. The patch adds another PreExec function that is run just before Exec, and which should ensure that the daemon is ready for serving client before it returns. Its result is then sent as the third argument to Exec. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 24, 2010
-
-
Michael Hanselmann authored
As already noted in the design document, an opcode's priority is increased when the lock(s) can't be acquired within a certain amount of time, except at the highest priority, where in such a case a blocking acquire is used. A unittest is provided. Priorities are not yet used for acquiring the lock(s)—this will need further changes on mcpu. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Sep 07, 2010
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 24, 2010
-
-
Michael Hanselmann authored
This patch adds an initial implementation of a lock monitor, accessible for the user through “gnt-debug locks”. It currently shows all resource locks: BGL, nodes and instances. Config and job queue locks could be shown too, but wouldn't be of much help. The current owner(s) and mode are also shown. Showing pending acquires will require further changes on the SharedLock internals and is not yet implemented. Example output: $ gnt-debug locks -o name,mode,owner Name Mode Owner BGL/BGL shared JobQueue19/Job147 instances/inst1 exclusive JobQueue19/Job147 instances/inst2 - - instances/inst3 - - instances/inst4 - - nodes/node1 exclusive JobQueue19/Job147 nodes/node2 exclusive JobQueue19/Job147 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 18, 2010
-
-
Manuel Franceschini authored
This patch enables IPv6 name resolution by using socket.getaddrinfo instead of socket.gethostbyname_ex. It renames the HostInfo class to Hostname and unifies its use throughout the code. This is achieved by using static calls where no object is needed and removes some obsolete code. For now, we just resolve to IPv4 addresses, but this will change once it is needed. Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 29, 2010
-
-
Michael Hanselmann authored
By changing it to a normal parameter, which must be a sequence, we can start using keyword parameters. Before this patch all arguments to “AddTask(self, *args)” were passed as arguments to the worker's “RunTask” method. Priorities, which should be optional and will be implemented in a future patch, must be passed as a keyword parameter. This means “*args” can no longer be used as one can't combine *args and keyword parameters in a clean way: >>> def f(name=None, *args): ... print "%r, %r" % (args, name) ... >>> f("p1", "p2", "p3", name="thename") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: f() got multiple values for keyword argument 'name' Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 26, 2010
-
-
Iustin Pop authored
Currently, the master IP activation is done in the Exec function. Since the original masterd process returns after forking, and Exec is run in the (grand)child process, this means that after 'ganeti-masterd' has returned there are still initialization tasks running. Normally this is not a problem, but in cases where one does quick master failovers, this creates a race condition which hits the QA scripts especially hard. To solve this, and make the startup process cleaner (the system is in steady state after the command has returned, even though masterd startup could still fail), we move the IP activation to Check(). This also allows error messages about the IP activation to be seen on the console. With this patch enabled, I can no longer reproduce the double-failover errors, which were occuring before in 4/5 cases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This is needed because not just the cli scripts need this decorator, but the master daemon too (and it already duplicated the code once). In cli.py we just leave a stub, so that we don't have to modify all the scripts to import rpc.py. We then change the master daemon code to reuse this decorator, instead of duplicating it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jul 16, 2010
-
-
Michael Hanselmann authored
Instead of using our custom HTTP client, using PycURL's multi interface allows us to get rid of the HTTP client threadpool. The majority of the code is still in the ganeti.http.client module. A simple per-thread HTTP client pool gives cURL a chance to cache and retain as much information as possible (e.g. SSL certs). Unused HTTP clients (e.g. due to removed nodes) are deleted after 25 requests going through the pool. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 09, 2010
-
-
Manuel Franceschini authored
This patch moves network utility functions to a dedicated module. Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jun 30, 2010
-
-
Guido Trotter authored
Why it's needed here but not a few lines above is a mistery that only pylint understands. Also fix an indentation error in another disable, for the same function. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 29, 2010
-
-
Guido Trotter authored
Each luxi connection now creates an asyncore MasterClientHandler (which is an AsyncTerminatedMessageStream subclass, sending each message to a client worker). This makes it harder to DOS the master daemon by just creating luxi connections, as each of them will use memory and file descriptors, but not a dedicated thread. Each connection will only handle one message at a time. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 04, 2010
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jun 03, 2010
-
-
Guido Trotter authored
Not much changes with this patch. The main loop for the IOServer is repaced by mainloop.Run() and the main thread now uses asyncore to handle connections to the master socket. Once it accepts them, though, it just pushes them to the current infrastructure, and everything proceeds as before. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 18, 2010
-
-
Guido Trotter authored
Currently the EOM terminator is hardcoded on the server side, and is customizable in the Transport object (with the default being the same as the value found in the server), but not in the luxi client. With this patch we move the value to constants, and remove the "fake" customizability, which would just break client/server communication. If we ever need to have a luxi transport with a different terminator it's easy enough to add it back. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 17, 2010
-
-
Michael Hanselmann authored
Ganeti errors should also be logged with a backtrace. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Apr 23, 2010
-
-
Michael Hanselmann authored
This can be very useful if client programs run as non-root. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Apr 09, 2010
-
-
Guido Trotter authored
Under squeeze pylint reports the following errors: ************* Module ganeti.serializer E1103:155:LoadSignedJson: Instance of 'False' has no 'get' member (but some types could not be inferred) ************* Module ganeti-masterd E1103:166:ClientRqHandler.handle: Instance of 'False' has no 'get' member (but some types could not be inferred) E1103:167:ClientRqHandler.handle: Instance of 'False' has no 'get' member (but some types could not be inferred) ************* Module gnt-instance E1103:431:BatchCreate: Instance of 'False' has no 'keys' member (but some types could not be inferred) For the first two cases it's actually wrong: we had checked before that the variable on which "get" is called is actually a dict. In the third case though such check doesn't exist, so we add it. Then we silence the error all three times. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Mar 18, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Feb 18, 2010
-
-
Michael Hanselmann authored
This function could be useful in other places and this way we can easily unittest it. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 22, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also fix a typo in http/__init__.py and add unittests for the LUXI parsing and formatting functions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 13, 2010
-
-
Michael Hanselmann authored
Having a proper name instead of just a number makes debugging easier. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 05, 2010
-
-
Iustin Pop authored
This changes from submitting jobs to get the tags (in cli scripts) to queries, which (since the tags query is a cheap one) should be much faster. The tags queries are already done without locks (in the generic query paths for instances/nodes/cluster), so this shouldn't break tags query via gnt-* list-tags. On a small cluster, the runtime of gnt-cluster/gnt-instance list tags more than halves; on a big cluster (with many MCs) I expect it to be more than 5 times faster. The speed of the tags get is not the main gain, it is eliminating a job when a simple query is enough. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 04, 2010
-
-
Iustin Pop authored
Many of our functions have to follow a given API, and thus we have to keep a given signature, but pylint doesn't understand this. Therefore, we silence this warning. The patch does a few other cleanups. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
Of all daemons, only rapi did abort when given argument. None of our daemons use any arguments, but they accepted them blindly. This is a very bad experience for the user. This patch adds checking and exiting in all daemons, in a uniform way. One other option would have been to add a flag to GenericMain (noargs=True). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
Iustin Pop authored
This patch should have only: - pylint disables - docstring changes - whitespace changes Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Dec 01, 2009
-
-
Michael Hanselmann authored
Passing it in as a parameter seems more logical. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Nov 25, 2009
-
-
Iustin Pop authored
This patch removes the quotes from CommaJoin and converts most of the callers (that I could find) to it. Since CommaJoin does str(i) for i in param, we can remove these, thus simplifying slightly a few calls. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 06, 2009
-
-
Guido Trotter authored
This is ok because adding a node or instance cannot happen in a query. We get the ec id from the LU and pass it to _EnsureUUID, which will then for now not use it. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
When the processor is executing a job, it can export the execution id to its callers. This is not supported for Queries, as they're not executed in a job. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 02, 2009
-
-
Iustin Pop authored
This finishes the conversion of OpPrereqError creation to two-argument style. Any leftovers as one-argument are not breaking anything, just losing information about the errors. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 29, 2009
-
-
Iustin Pop authored
This is not needed anymore - the original change was more than a year ago when masterd was in its incipient phase. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Sep 25, 2009
-
-
Iustin Pop authored
Currently the luxi error handling is hardcoded as special encoding on the masterd-side and special decoding on the client side. This patch moves it to errors.py such that other parts of the code can reuse the same encoding. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 6956e9cd)
-
- Sep 17, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Sep 15, 2009
-
-
Michael Hanselmann authored
This can be useful for debugging locking problems. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
There are two major arguments for this: - There will be more callbacks (e.g. for lock debugging) and extending the parameter list is a lot of work. - In the jqueue module this allows us to keep per-job or per-opcode variables in a separate class. Instead of having to clean up the worker class after processing one job, these references will automatically go out of scope. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 31, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-