- 24 Apr, 2014 13 commits
-
-
Klaus Aehlig authored
Add a derived parameter for nodes, providing the ratio of virtual CPUs per CPU-speed weighted physical CPU. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Make the htools luxi backend also query for cpu_speed and take the result into account. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Extend the text format by an optional column for each node containing the relative CPU speed, if provided. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Add a function on nodes modifying the CPU speed parameter. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Add an additional parameter to the representation of a node for the relative CPU speed, initially set to 1. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
In other words, remove "cpu_speed" from all "nodeparams" where it is present, be it cluster, group, or node. Note that upgrading is no problem, as the default value will be used implicitly. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
This parameter will describe the speed of the CPU relative to the speed of a "normal" node in this node group. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
...in order not to have to declare floating point values as VTypeInt and rely on the sloppiness of the JSON specification to not distinguish between integers and floating point numbers. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
This document really only talks about CPU speed. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Petr Pudlak authored
In this case, the call trying to acquire a shared lock always succeeds, because the daemon already has an exclusive lock, which falsely reports that the job has died. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
In particular, distinguish the cases when a job could not have been cancelled and when a job has already finished. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. because modifying the queue inside the handler can have unexpected consequences. Since Python 2 doesn't have a nice way how to modify a variable from an inner function, we have to use a list as a wrapper. (Python 3 has the "nonlocal" keyword for it.) Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 22 Apr, 2014 6 commits
-
-
Klaus Aehlig authored
When failing a job, add an entry to the reason trail, indicating what made the job fail (e.g., failed to fork or detected job death). Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
...to simplify manipulation of them. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
...to be able to operate on the MetaOpCode that is behind an InputOpCode (if we're in the right component of the sum). Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
...so that manipulations deep within such an object get more simple. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Move all the definition of objects to a spearate file. In this way, the lense module for JQueue can use these objects, while JQueue can use the lenses. For use outside, we reexport the objects. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 17 Apr, 2014 21 commits
-
-
Petr Pudlak authored
.. and get rid of unnecessary variable binding. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. because with the new mechanism, the process can be slower and the job sometimes returned successfully before it could have been cancelled. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Klaus Aehlig authored
Make the onTimeWatcher of the job queue scheduler also verify that all notionally running jobs are indeed alive. If a job is found dead, remove it from the list of running jobs and update the job file to reflect the unexpected death. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Petr Pudlak authored
We can only send the signal if the job is alive and if there is a process ID in the job file (which means that the signal handler has been installed). If it's missing, we need to wait and retry. In addition, after we send the signal, we wait for the job to actually die, to retain the original semantics. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. so that it can be viewed what lock file and with what result was tested. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
The functionality is kept the same, but instead of comparing for equality, a more general version based on a predicate is added. This allows to base the condition on only a part of the output. In addition, 'bracket' is added so that inotify data structure is properly cleaned up even if the inner IO action throws an exception. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. so that it's possible to use logging operations there. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
This is a bit problematic as there is no portable way how to list all open file descriptors, and we can't track them all, because they're also opened by third party libraries such as inotify. Therefore we use /proc/self/fd and /dev/fd, which should work for all Linux flavors and most *BSD as well. If both are missing, we don't do anything and just log a warning. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
`orElse` works just as `mplus` of ResultT, but it only requires `MonadError` and doesn't accumulate the errors, it just returns the second one, if both actions fail. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
If the endpoint (such as Luxid or WConfd) isn't running, don't fail immediately. Instead retry (within the given timeout) and try to reconnect. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
On the Python side it was assumed that the blacklisted private parameters were always dictionaries, but since they're optional, they could be 'None' as well. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
Since now each process only creates a 1-job queue, trying to use file locks only causes job deadlock. Also reduce the number of threads running in a job queue to 1. Later the job queue will be removed completely. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
If a Haskell program is compiled with -threaded, then inheriting open file descriptors doesn't work, which breaks our job death detection mechanism. (And on older GHC versions even forking doesn't work.) Therefore let Luxi daemon check and let it fail to start, if it detect it has been compiled with -threaded. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Klaus Aehlig authored
As luxid forks off processes now, it may receive SIGCHLD signals. Hence add a handler for this. Since we obtain the success of the child from the job file, ignoring is good enough. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Petr Pudlak authored
.. instead of just letting the master daemon to handle them. We try to start all given jobs independently and requeue those that failed. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. which will be used if the Luxi daemon attempts to start a job, but fails. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
The ID of the current process is stored in the job file. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
This will allow to check if a particular job is alive, and send signals to it when it's running. The fields aren't serialized, if missing, for backwards compatibility. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. using the POSIX type ProcessID. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
They will be used by Luxi daemon to spawn jobs as separate processes. The communication protocol between the Luxi daemon and a spawned process is described in the documentation of module Ganeti.Query.Exec. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
Use the function where appropriate. Also handling of CancelJob is slightly refactored to use ResultT, which is used by the new function. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-