- 15 Oct, 2014 1 commit
-
-
Niklas Hambuechen authored
This implements the operational part of the design doc "Filtering of jobs for the Ganeti job queue" (design-optables.rst). It includes - respecting filter rules when jobs are scheduled - cancelling running jobs rejected by filters - re-running the scheduler when filter rules are changed - handling of the filter actions ACCEPT, CONTINUE, PAUSE, REJECT and RATE_LIMIT - implementation of the "jobid", "opcode" and "reason" predicates Signed-off-by:
Niklas Hambuechen <niklash@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 14 Oct, 2014 2 commits
-
-
Klaus Aehlig authored
We cannot avoid the race on death detection after forcefully killing a job: the only guarantee the operating system gives us is that the process will die eventually. However, we can improve the chance of being able to successfully clean up a job by retrying death detection. Do this. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Niklas Hambuechen <niklash@google.com>
-
Klaus Aehlig authored
Make cleanupIfDead report the death status of the job, so that a caller can decide to retry. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Niklas Hambuechen <niklash@google.com>
-
- 08 Oct, 2014 1 commit
-
-
Klaus Aehlig authored
...that now is a good time to check if some resource owners have died. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Niklas Hambuechen <niklash@google.com>
-
- 07 Oct, 2014 2 commits
-
-
Niklas Hambuechen authored
This implements the management (add, remove, replace) of job filter rules as specified by `doc/design-optables.rst`; it does not yet implement the filtering logic. This commit also includes the implementation of filters in the RAPI client.py and its tests because the client tests check that if all available RAPI resources are being used by the client, and the only way to make them be used is to write tests for them. Signed-off-by:
Niklas Hambuechen <niklash@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Niklas Hambuechen authored
We will need to query things that have no names (only UUIDs) as well. Signed-off-by:
Niklas Hambuechen <niklash@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 26 Sep, 2014 1 commit
-
-
Aaron Karper authored
The configuration for the data collector intervals is settable via gnt-cluster modify. The info is visible in gnt-cluster info. Signed-off-by:
Aaron Karper <akarper@google.com> Signed-off-by:
Michele Tartara <mtartara@google.com> Reviewed-by:
Michele Tartara <mtartara@google.com>
-
- 24 Sep, 2014 3 commits
-
-
Klaus Aehlig authored
Once we sent a sigKILL to a process there is a high chance that it actually died. So this is a good point in time to verify death and clean up the job queue. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
To do so, extend the cancelJob function by an additional kill flag. Also, if we kill a job, we expect it to die regardless of whether it already acquired all the needed locks. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
This flag will instruct luxid to send send a SIGKILL instead of a SIGTERM if the job to be canceled has already started. To not have to change RAPI all in one big change, we keep the wire protocol backwards compatible even though this is not strictly necessary for the luxi protocol. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 12 Sep, 2014 1 commit
-
-
Klaus Aehlig authored
It was decided that Ganeti is relicensed under the 2-clause BSD license. Update the license statements accordingly (issue #936). Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 28 Aug, 2014 2 commits
-
-
Aaron Karper authored
Adds data collectors to gnt-cluster info, currently only showing the activation state. Signed-off-by:
Aaron Karper <akarper@google.com> Signed-off-by:
Michele Tartara <mtartara@google.com> Reviewed-by:
Michele Tartara <mtartara@google.com>
-
Aaron Karper authored
Replaces gnt-cluster options `--enable_data_collector` / `--disable_data_collectors` with `--enabled_data_collectors`, which takes key-value pairs and sets the data collector activation state to the given value. E.g. gnt-cluster modify --enabled-data-collectors=cpu-avg-load=true,inst- status-xen=false Signed-off-by:
Aaron Karper <akarper@google.com> Signed-off-by:
Michele Tartara <mtartara@google.com> Reviewed-by:
Michele Tartara <mtartara@google.com>
-
- 26 Aug, 2014 1 commit
-
-
Aaron Karper' via ganeti-devel authored
Generally a data collector can be deactivated in the config and each data collector defines own conditions to show/hide it. A hidden data collector is not shown in /1/list/collectors, /1/report/all. The relevant config is requested from the RConfD. * Added new constant dataCollectorNames: The consistency is checked at compile time thanks to template haskell. * cfgupgrade: add datacollectors section in cluster. The section currently has a single entry: 'active' * Added Arbitrary instances for the relevant types * Move Ganeti.Monitoring.{Server,Types}.DataCollector * Move {Monitoring.Server,DataCollectors.Types}.DataCollector Implements Issue 870: inst-status-xen should not be shown for KVM cluster. Signed-off-by:
Aaron Karper <akarper@google.com> Signed-off-by:
Michele Tartara <mtartara@google.com> Reviewed-by:
Michele Tartara <mtartara@google.com>
-
- 02 Jul, 2014 1 commit
-
-
Klaus Aehlig authored
As we have introduced a new cluster parameter, it should be also visible when querying about the cluster configuration. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 26 Jun, 2014 1 commit
-
-
Klaus Aehlig authored
When a job where the priority should be changed has already started, inform the job by singalling. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 10 Jun, 2014 1 commit
-
-
Jose A. Lopes authored
... including object, queries, opcode, LU, command line, upgrade, etc. Signed-off-by:
Jose A. Lopes <jabolopes@google.com> Reviewed-by:
Hrvoje Ribicic <riba@google.com> Reviewed-by:
Helga Velroyen <helgav@google.com>
-
- 03 Jun, 2014 2 commits
-
-
Klaus Aehlig authored
...so that other daemons can use this functionality as well, without the need to duplicate code. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
...so that it can be used by other daemons as well. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 02 Jun, 2014 2 commits
-
-
Klaus Aehlig authored
As luxid now starts jobs, make it verify that it is running on the master node by carrying out the Ganeti voting process. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
Move the starting of the job scheduler to a later stage in the startup. In particular, only start it after the job-queue lock file is obtained. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 19 May, 2014 1 commit
-
-
Petr Pudlak authored
A small refactoring was done in handling ArchiveJob so that it was possible to use 'withLock'. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 14 May, 2014 2 commits
-
-
Jose A. Lopes authored
The 'Cluster.install_image' param holds the location of the image to be used for the safe installation instances. Signed-off-by:
Jose A. Lopes <jabolopes@google.com> Reviewed-by:
Hrvoje Ribicic <riba@google.com>
-
Klaus Aehlig authored
...as masterd is no more. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 13 May, 2014 3 commits
-
-
Klaus Aehlig authored
This is the last task currently done by masterd, so makeing luxid taking this over, we can get rid of masterd. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Klaus Aehlig authored
...instead of replicating the functionality on the fly. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Hrvoje Ribicic authored
This patch makes the myriad of changes necessary for the compression tool parameter to be added. The filtering of compression tools for suspicious entries has been added for this exact purpose. Signed-off-by:
Hrvoje Ribicic <riba@google.com> Reviewed-by:
Thomas Thrainer <thomasth@google.com>
-
- 24 Apr, 2014 1 commit
-
-
Petr Pudlak authored
In this case, the call trying to acquire a shared lock always succeeds, because the daemon already has an exclusive lock, which falsely reports that the job has died. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 17 Apr, 2014 5 commits
-
-
Petr Pudlak authored
We can only send the signal if the job is alive and if there is a process ID in the job file (which means that the signal handler has been installed). If it's missing, we need to wait and retry. In addition, after we send the signal, we wait for the job to actually die, to retain the original semantics. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
If a Haskell program is compiled with -threaded, then inheriting open file descriptors doesn't work, which breaks our job death detection mechanism. (And on older GHC versions even forking doesn't work.) Therefore let Luxi daemon check and let it fail to start, if it detect it has been compiled with -threaded. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Klaus Aehlig authored
As luxid forks off processes now, it may receive SIGCHLD signals. Hence add a handler for this. Since we obtain the success of the child from the job file, ignoring is good enough. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
Petr Pudlak authored
Use the function where appropriate. Also handling of CancelJob is slightly refactored to use ResultT, which is used by the new function. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Petr Pudlak authored
.. of the locked file so that it can be closed later, if needed. Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 15 Apr, 2014 1 commit
-
-
Klaus Aehlig authored
When queried to WaitForJobChange of an non-existent job, report this as an error. Signed-off-by:
Klaus Aehlig <aehlig@google.com> Reviewed-by:
Petr Pudlak <pudlak@google.com>
-
- 08 Apr, 2014 1 commit
-
-
Hrvoje Ribicic authored
This patch adds the zeroing-image option to gnt-cluster and the OpBackupExport params. The many changes are all minor, yet necessary. Signed-off-by:
Hrvoje Ribicic <riba@google.com> Reviewed-by:
Jose A. Lopes <jabolopes@google.com>
-
- 28 Feb, 2014 1 commit
-
-
Dimitris Bliablias authored
Include mac-prefix setting in the output of 'gnt-cluster info' command. This fixes part of issue 239. Signed-off-by:
Dimitris Bliablias <bl.dimitris@gmail.com> Reviewed-by:
Jose A. Lopes <jabolopes@google.com>
-
- 27 Feb, 2014 2 commits
-
-
Michele Tartara authored
Not only SubmitJobToDrainedQueue (and therefore SubmitJob) but also SubmitManyJobs has to add "gnt:opcode:*" entries to the reason trail. Signed-off-by:
Michele Tartara <mtartara@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
Michele Tartara authored
The entry used to be added in jqueue.py, but after switching the queue management from masterd to luxyd it had been lost. Now, make LuxiD responsible for adding it. Signed-off-by:
Michele Tartara <mtartara@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 14 Feb, 2014 1 commit
-
-
Petr Pudlak authored
.. as long as they're instances of "MonadBaseControl IO" and "MonadLog". This allows the UDSServer to call functions like "fork" within monads such as "ResultT e IO" or "ReaderT IO". Signed-off-by:
Petr Pudlak <pudlak@google.com> Reviewed-by:
Klaus Aehlig <aehlig@google.com>
-
- 13 Feb, 2014 1 commit
-
-
Jose A. Lopes authored
* Add parameter 'instance_communication_parameter' to the Python 'ganeti.objects.Cluster' and the Haskell 'Ganeti.Objects.Cluster'. * Update Haskell 'QueryClusterInfo' to return also the 'instance_communication_network' parameter. * Update Python 'LUClusterQuery' to return also the 'instance_communication_network' parameter. * Update Python 'ShowClusterConfig' to include information about the 'instance_commuication_network' parameter * Update 'ganeti.objects.Cluster.UpgradeConfig' to ugprade also 'instance_communication_network' parameter to the empty string, if unspecified. * Update the configuration upgrade tool (i.e., 'tools/cfgupgrade') to handle upgrading of the 'instance_communication_network' parameter as well as downgrading. Signed-off-by:
Jose A. Lopes <jabolopes@google.com> Reviewed-by:
Helga Velroyen <helgav@google.com>
-