- Jun 22, 2011
-
-
Apollon Oikonomopoulos authored
Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- May 02, 2011
-
-
Iustin Pop authored
With the current code, it's possible to mistake a ^C for a protocol error: node1# gnt-job info 221691 [press ^C] Unhandled protocol error while talking to the master daemon: Error while deserializing response: (and note empty error message). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Jan 07, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 06, 2011
-
-
Michael Hanselmann authored
Locks can now be queried using “Query(what="lock", …)” over LUXI. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 20, 2010
-
-
Michael Hanselmann authored
If the socket can't be read in time, it raises “socket.timeout”, for which there is special handling code. Unfortunately the exception block was in the wrong order and “socket.error” caught it before. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 13, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Dec 01, 2010
-
-
Adeodato Simo authored
This also updates masterd.py. Signed-off-by:
Adeodato Simo <dato@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 01, 2010
-
-
Guido Trotter authored
This is already disabled for the same type of request a couple of lines above. The new code was introduced in e986f20c but didn't have the disables. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 28, 2010
-
-
Michael Hanselmann authored
A new constant, LUXI_VERSION, is used to verify the peer's version. The version is optional, so old(er) clients and servers talking to peers not supporting it won't break. Example with mismatching library: $ gnt-instance list Unhandled Ganeti error: LUXI version mismatch, server 2020000, request 1010000 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This allows LUXI errors to be encoded and serialized. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Aug 24, 2010
-
-
Michael Hanselmann authored
This patch adds an initial implementation of a lock monitor, accessible for the user through “gnt-debug locks”. It currently shows all resource locks: BGL, nodes and instances. Config and job queue locks could be shown too, but wouldn't be of much help. The current owner(s) and mode are also shown. Showing pending acquires will require further changes on the SharedLock internals and is not yet implemented. Example output: $ gnt-debug locks -o name,mode,owner Name Mode Owner BGL/BGL shared JobQueue19/Job147 instances/inst1 exclusive JobQueue19/Job147 instances/inst2 - - instances/inst3 - - instances/inst4 - - nodes/node1 exclusive JobQueue19/Job147 nodes/node2 exclusive JobQueue19/Job147 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 28, 2010
-
-
Iustin Pop authored
This patch adds handling of permission errors so that we don't show tracebacks when a non-root user runs a gnt-* command. Since in the future we'll have different permissions, we need to handle this in RAPI too. It also fixes a typo in RAPI error message and the docstrings of LUXI errors. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 18, 2010
-
-
Guido Trotter authored
Currently the EOM terminator is hardcoded on the server side, and is customizable in the Transport object (with the default being the same as the value found in the server), but not in the luxi client. With this patch we move the value to constants, and remove the "fake" customizability, which would just break client/server communication. If we ever need to have a luxi transport with a different terminator it's easy enough to add it back. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 11, 2010
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Feb 22, 2010
-
-
Michael Hanselmann authored
Jobs submitted via the standard command line utilities didn't give any indication that anything is happening while they were waiting in the job queue (e.g. due to other jobs using all worker threads) or acquiring locks. This could be very confusing for people not familiar with Ganeti's architecture. Now they'll show a message after the first WaitForJobChanges timeout. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If too many clients try to connect to the master at the same time, some of them might fail if the master doesn't accept the connections fast enough. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 22, 2010
-
-
Michael Hanselmann authored
Also fix a typo in http/__init__.py and add unittests for the LUXI parsing and formatting functions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Having only one exception hierarchy makes catching them simpler. Before ProtocolError would derive directly from Exception, but with this patch it'll also be in the hierarchy defined by the ganeti.errors module. Separating encoding and decoding errors is not necessary at this point as they're never handled separately, and merging them removes a few lines from the code. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 05, 2010
-
-
Iustin Pop authored
This changes from submitting jobs to get the tags (in cli scripts) to queries, which (since the tags query is a cheap one) should be much faster. The tags queries are already done without locks (in the generic query paths for instances/nodes/cluster), so this shouldn't break tags query via gnt-* list-tags. On a small cluster, the runtime of gnt-cluster/gnt-instance list tags more than halves; on a big cluster (with many MCs) I expect it to be more than 5 times faster. The speed of the tags get is not the main gain, it is eliminating a job when a simple query is enough. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Jan 04, 2010
-
-
Iustin Pop authored
This patch should have only: - pylint disables - docstring changes - whitespace changes Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Olivier Tharan <olive@google.com>
-
- Oct 13, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Sep 25, 2009
-
-
Iustin Pop authored
Currently the luxi error handling is hardcoded as special encoding on the masterd-side and special decoding on the client side. This patch moves it to errors.py such that other parts of the code can reuse the same encoding. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> (cherry picked from commit 6956e9cd)
-
- Aug 27, 2009
-
-
Iustin Pop authored
Currently the luxi error handling is hardcoded as special encoding on the masterd-side and special decoding on the client side. This patch moves it to errors.py such that other parts of the code can reuse the same encoding. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Aug 26, 2009
-
-
Michael Hanselmann authored
This can be used during maintenance work. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jul 19, 2009
-
-
Iustin Pop authored
As a workaround for the job submit timeouts that we have, this patch adds a new luxi call for multi-job submit; the advantage is that all the jobs are added in the queue and only after the workers can start processing them. This is definitely faster than per-job submit, where the submission of new jobs competes with the workers processing jobs. On a pure no-op OpDelay opcode (not on master, not on nodes), we have: - 100 jobs: - individual: submit time ~21s, processing time ~21s - multiple: submit time 7-9s, processing time ~22s - 250 jobs: - individual: submit time ~56s, processing time ~57s run 2: ~54s ~55s - multiple: submit time ~20s, processing time ~51s run 2: ~17s ~52s which shows that we indeed gain on the client side, and maybe even on the total processing time for a high number of jobs. For just 10 or so I expect the difference to be just noise. This will probably require increasing the timeout a little when submitting too many jobs - 250 jobs at ~20 seconds is close to the current rw timeout of 60s. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> (cherry picked from commit 2971c913)
-
- Jul 07, 2009
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If a user used ^Z to stop the program, poll() in socket.recv would return EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 21, 2009
-
-
Iustin Pop authored
As a workaround for the job submit timeouts that we have, this patch adds a new luxi call for multi-job submit; the advantage is that all the jobs are added in the queue and only after the workers can start processing them. This is definitely faster than per-job submit, where the submission of new jobs competes with the workers processing jobs. On a pure no-op OpDelay opcode (not on master, not on nodes), we have: - 100 jobs: - individual: submit time ~21s, processing time ~21s - multiple: submit time 7-9s, processing time ~22s - 250 jobs: - individual: submit time ~56s, processing time ~57s run 2: ~54s ~55s - multiple: submit time ~20s, processing time ~51s run 2: ~17s ~52s which shows that we indeed gain on the client side, and maybe even on the total processing time for a high number of jobs. For just 10 or so I expect the difference to be just noise. This will probably require increasing the timeout a little when submitting too many jobs - 250 jobs at ~20 seconds is close to the current rw timeout of 60s. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Feb 04, 2009
-
-
Iustin Pop authored
This is the last query that RAPI executes via opcodes and is purely static (config values only). As such, we can convert it safely to a query instead of job. Reviewed-by: imsnah
-
Iustin Pop authored
This patch adds the framework for, and enables lockless OpQueryInstances. This means that instances will be shown in ERROR_up or ERROR_down state, even though this is not an error (but just an in-progress job). The framework is implemented as follows: - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take an additional “use_locking” flag which will denote whether to lock or not; this patch only implements this for LUQueryInstances - the luxi query functions take an additional argument use_locking which is passed to the master daemon, and then passed to the above opcodes - cli.py export a new SYNC_OPT command line options which implement setting this flag to true - except for gnt-instance list, which uses this option, and for name-only queries (e.g. QueryNodes(fields=["names"])), all other callers are setting this flag to True - RAPI also sets the flag to True The patch was tested with a continuous (0.2s sleep in-between) gnt-instance list during a burnin, and no problems were observed. Reviewed-by: ultrotter
-
- Jan 22, 2009
-
-
Iustin Pop authored
This is less of an actual issue for regular gnt-* clients, but it's easily reproducible with burnin and possible with RAPI (depending on how the program uses luxi.Client(s)). In case of burnin, if we interrupt the client (^C) while it polls the job, it will abort and raise an error. After that, burnin issues a remove instance job, and at this point, we send the submit job (remove) call but the first thing we read from the socket will be the response to the previous poll job request, since that was queued already from the master. To solve this, whenever we detect an error in Transport.Call(), we close that transport and re-create a new one, to start anew. The other alternative would be to introduce a sequence to the protocol, but this is something that would be design-level change and it's not recommended at this stage. Reviewed-by: imsnah
-
- Jan 20, 2009
-
-
Guido Trotter authored
Reviewed-by: iustinp
-
- Dec 18, 2008
-
-
Michael Hanselmann authored
With a large job queue, auto-archiving jobs can take a very long time, causing timeouts on the luxi RPC layer. With this change, auto- archive returns after half of the RPC timeout has passed. The user will see how many jobs are left unchecked. Reviewed-by: ultrotter
-
- Oct 16, 2008
-
-
Iustin Pop authored
This adds the set/reset in the jqueue and luxi modules, and a way to query it in OpQueryConfigValues, and also the comand line interface for it: $ gnt-cluster queue info The drain flag is unset $ gnt-cluster queue drain $ gnt-cluster queue info The drain flag is set $ gnt-cluster queue undrain $ gnt-cluster queue info The drain flag is unset The choice of making the setting via luxi and not an opcode is that opcodes can't be executed when drained, but we don't query via luxi since in the future it might become a cluster property as opposed to a node one. Reviewed-by: imsnah
-
- Oct 15, 2008
-
-
Iustin Pop authored
This patch adds a generic method to identify the ganeti error given its class name, and implements this across the luxi protocol. Reviewed-by: imsnah
-
- Oct 06, 2008
-
-
Iustin Pop authored
This patch adds a new luxi call that implements auto-archiving of jobs older than a certain age (or -1 for all completed jobs), and the gnt-job command that makes use of this (with 'all' for -1). Reviewed-by: imsnah
-
- Oct 01, 2008
-
-
Michael Hanselmann authored
This can be used to retrieve certain cluster config values from within clients. OpDumpClusterConfig was not used anywhere, hence I'm just reusing it. The way ConfigWriter.DumpConfig returned the configuration was not thread-safe, anyway (no deepcopy). Reviewed-by: iustinp
-
- Aug 29, 2008
-
-
Iustin Pop authored
This patch alters the WaitForJobChanges luxi-RPC call to have a configurable timeout, so that the call behaves nicely with long jobs that have no update. We do this by adding a timeout parameter in the RPC call, and returning a special constant when the timeout is reached without an update. The luxi client will repeatedly call the WaitForJobChanges until it gets a real change. The timeout is hardcoded as half the RWTO value. The patch also removes an unused variable (new_state) from the WaitForJobChanges method. Reviewed-by: imsnah,ultrotter
-
- Aug 28, 2008
-
-
Michael Hanselmann authored
Reported by Iustin. Reviewed-by: iustinp
-