- May 12, 2009
-
-
Iustin Pop authored
This big patch converts the documentation build system to sphinx (http://sphinx.pocoo.org/ ). Since that uses reStructuredText sources too, there is no change (yet) in the documents themselves, just in the build system. As before, the docs are pre built by the maintainer, and the end-user doesn't need sphinx or other rst tools to build the docs. Note that we are not distributing PDFs, so building that will require the tools. The docs will be stored under doc/html and the build system also need an extra directory doc/build. These are considered (by automake) maintainer-related objects and are removed at maintainer-clean time. The patch also fixes some small issues: add a docpng variable, add doc/api (also generated by maintainer) in maintainer-clean-local, etc. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
This patch removes the autogeneration of the RAPI docs from the code (based on docstrings) and moves the current autogenerated output to the rapi.rst file. The reasons behind this are multiple: - the build system becomes a little more simple (this could have been achieved also by distributing the built documentation, though) - it's hard to actually write documentation in docstrings; you have to fit restructured text inside the docstrings, and this results in not really nice output - even by being close to the code, the documentation manages to get out of sync (not paying attention to docstrings) This will also help with the move to sphinx. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 06, 2009
-
-
Guido Trotter authored
Sometimes reinstalls are slightly different than new installs. For example certain partitions may need to be preserved accross reinstalls. In order to do that on a per-os basis we pass in the INSTANCE_REINSTALL variable to inform the create script about when a reinstall is happening. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
This document contains a skeleton for the 2.1 design process. For now it just has introductory paragraphs and a structure for the various areas' design, but some sections still don't have a text, as we're still in the early design phases. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 05, 2009
-
-
Carlos Valiente authored
Python 2.6 complains about module 'sha' being deprecated. It makes execution of Ganeti commands a bit annoying, and when you run 'ganeti-watcher' in cron jobs, you get a mail message after every execution. Tests pass under under Python 2.6 and Python 2.4. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This patch fixes two issues with LUSetClusterParams and argument checking. First, this LU used the wrong function name (CheckParameters instead of CheckArguments), which means that no parameter checking was done at all; this impacted the candidate_pool_size checks (the only one done at this stage). Second, int() can raise both ValueError and TypeError, and we should correctly handle both. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- May 04, 2009
-
-
Iustin Pop authored
The current validation routine just says "failed", without specifying the node name. This is very confusing, and we should log the node name too. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
Iustin Pop authored
The current implementation of “gnt-cluster getmaster” doesn't work on non-master nodes, which is a regression from 1.2. This patch implements it (again) via ssconf. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Alexander Schreiber <als@google.com>
-
- Apr 27, 2009
-
-
Iustin Pop authored
Reviewed-by: ultrotter
-
- Apr 24, 2009
-
-
Guido Trotter authored
Add the --all argument, and reword a bit the basic information. Reviewed-by: iustinp
-
Guido Trotter authored
Don't show all instances info by default, but require --all to be passed for this time consuming operation. Reviewed-by: iustinp
-
Iustin Pop authored
Since the “list OSes” call is exported via RAPI, this can be used pretty easily to DOS the master daemon during long jobs. The implementation of LUDiagnoseOS makes an RPC call to all nodes; we lock nodes here in order to prevent node removal. However, after closer examination, the worst case is: - we get the list of nodes from the config - another thread removes a node - our RPC queries reach the removed node As this point, if ganeti-noded is stopped or doesn't accept our queries, the RPC call will return failed, and in the current implementation all OSes will become invalid. If we change the ‘failed RPC’ handling to ignore such nodes, this allows us to both remove locking, and to handle transient RPC failures better (not invalidating all OSes). This patch does both these things, with a single drawback: in gnt-os diagnose, the down nodes do not appear at all. I think this is a small drawback, and the alternative is to add them with status failed; this works (3-line patch), but then the output of “list” and “diagnose” will no longer be consistent. As such, my proposal is to not list the nodes. Reviewed-by: ultrotter
-
Iustin Pop authored
When a remote node returns invalid LVM data, we check it, but we don't stop and continue with the rest of the checks (which require a valid volume group). This raises an internal error and breaks verify disks. This seems unchanged for a long while, I don't know why it surfaced just recently. Reviewed-by: ultrotter
-
Iustin Pop authored
When vg_name is not returned at all, we currently abort with an internal error. This is because we don't catch KeyError. This patch adds a custom message for this case, and also adds KeyError to the list of catched exceptions, just for safety. On the other hand, we could also just remove this piece of code since it's not used at all the ["dfree"] value. Reviewed-by: ultrotter
-
- Apr 15, 2009
-
-
Iustin Pop authored
This patch adds a couple of both externally and internally reported issues: - missing SGML tags (Issue 54), report and patch by superdupont - wrong variable used in the init.d script, report and patch by Karsten Keil <karsten-keil@t-online.de> - man page for gnt-instance reinstall needs clarification (Issue 56) - gnt-instance man page missing --disks documentation for replace-disks - gnt-node modify help output is unclear about the -C/-D/-O input format, and the man page doesn't document this command at all - “gnt-node modify -C yes” for offline or drained nodes had wrong error message - “gnt-instance reinstall --select-os” has wrong prompt, we only accept a number for the OS and not the template name Reviewed-by: ultrotter
-
- Apr 14, 2009
-
-
Alexander Schreiber authored
Reviewed-by: iustinp
-
- Apr 08, 2009
-
-
Iustin Pop authored
Burnin tests were successful, release rc3. Reviewed-by: imsnah
-
- Apr 07, 2009
-
-
Iustin Pop authored
This patch changes the way documentation is built in order to distribute the generated output in the 'dist' archive, and thus no longer requiring the presence of the docbook/rst toolchains during build time. This will lower the requirements for installation and also makes the build time insignificant. First, we remove the docbook2pdf rules and variables, since we no longer build this kind of docs. Furthermore, the rst source files are not (today) processed via replace_vars_sed, so the whole .in rules for doc/ go away. Next, we change the ".sgml|.rst -> replace_vars_sed -> .in -> processor -> final file" processing to ".sgml|.rst -> generator -> .in -> replace_vars_sed -> final file"; this means we first process the file using the formatter, with the @VARIABLE@ entries in it, and save the output as .in; this output we distribute, and on the user side, the replace_vars_sed will use the new configure flags to transform the (almost final .in form) to the final form, without needing the toolchain. In configure.ac we also change from ERROR to WARN for the documentation generators, and extra tests in Makefile.am check that the programs have been found. This was tested with distcheck and works as expected. Reviewed-by: ultrotter
-
- Apr 06, 2009
-
-
Iustin Pop authored
This patch raises an error in the master daemon in case the user requests a locking query; accordingly, all clients were modified to send only lockless queries. This is short-term fix, for proper fix the clients should be modified to submit a job when the user request a locking query. The other approach would be to ignore the flag passed by the client; this would be worse as client's wouldn't get at least an error. The possible impact of this is multiple: - some commands could have been not converted, and thus fail; this can be remedied easily - the consistency of commands is lost; e.g. node failover will not lock the node *while we get the node info*, so we could miss some data; this is again in the thread of atomic operations which are missing in the current model of query-and-act from gnt-* scripts Reviewed-by: imsnah, ultrotter
-
Iustin Pop authored
Currently the watcher spews errors message on non-master nodes. This cleans it up. Reviewed-by: imsnah
-
Iustin Pop authored
As per the mailing list discussion, this patch changes the watcher to use a single job (two opcodes) for getting the cluster state (node list and instance list); it will then compute the needed actions based on this data. The patch also archives this job and the verify-disks job. Reviewed-by: imsnah
-
Iustin Pop authored
This patch fixes the Xen soft reboot ("xm reboot") via polling for a specific time for either changed domain ID or decreased CPU run-time. This sould prevent the race-conditions discussed on the mailing list for reboots. Reviewed-by: imsnah
-
Iustin Pop authored
Since the cluster tags are/should be more-or-less static, add them as an ssconf key, so that querying them is possible without creating a job/requiring the masterd to be running. Reviewed-by: imsnah
-
Iustin Pop authored
This patch will log data about queries, which are today completely invisible (at the default log level) in the master log file. Reviewed-by: imsnah
-
- Mar 27, 2009
-
-
Iustin Pop authored
This updates the NEWS file and bumps up the version number. Reviewed-by: ultrotter
-
- Mar 20, 2009
-
-
Guido Trotter authored
Allow expressions longer than one character to match. Reviewed-by: imsnah
-
Guido Trotter authored
set timeout_needs_update to False after calculating the timeout. Reviewed-by: imsnah
-
Guido Trotter authored
# gnt-cluster queue foo Failure: prerequisites not met for this operation: Command 'foo' is not valid. Reviewed-by: iustinp
-
- Mar 12, 2009
-
-
Guido Trotter authored
There is a bug in kvm, when binding vnc to a specific address the constant 'vnc_bind_address' is passed in, instead of the actual requested address. This patch fixes it. Reviewed-by: iustinp
-
Iustin Pop authored
This patch adds the newly-introduced node flags to the design document, as they currently are missing from there. The patch also reduces the TOC depth to 3, as it was too big. Reviewed-by: ultrotter
-
Iustin Pop authored
Similar to the --disk fixes a while ago, --net is broken too. This patch fixes it. Reviewed-by: imsnah
-
- Mar 10, 2009
-
-
Guido Trotter authored
s/"vnc_bind_address"/constants.HV_VNC_BIND_ADDRESS/ Reviewed-by: imsnah
-
- Mar 09, 2009
-
-
Iustin Pop authored
Currently, the watcher startup sequence does: - open a luxi client - get the instance list - get the node boot ids - open and lock the status file, and: - archive jobs - restart the down instances - check disks This, of course, can lead to problems when a node is (genuinely or not) locked for more than (watcher interval * maximum query clients) time. At that time, the master is completely unresponsive until the node is unlocked and all the watchers exit with error due to the state file being locked by the first instance. This patch reworks the startup sequence to first open/lock the status file, and only then open a luxi client. This should prevent the above case. Reviewed-by: ultrotter
-
Iustin Pop authored
Currently cluster-verify doesn't handle the (admitedly invalid) case where we have reservation for instances that were removed in the meantime. This patch adds a check for this and prevents code errors in cluster-verify in this case: * Verifying node node4.example.com (master candidate) - ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map Reviewed-by: imsnah
-
Iustin Pop authored
Currently the _CreateSingleBlockDev function only raises OpExecError and not BlockDeviceError. This means that we don't release the instance's temporary minors properly, and this creates problems later if the instance is removed without master restart. We could just use OpExecError, but adding it and leaving BlockDeviceError in seems safer. Reviewed-by: imsnah
-
- Mar 06, 2009
-
-
Iustin Pop authored
The instance objects did not get a serial_no field. This patch adds a new constants for the field name and uses it for all three cases (cluster, nodes, instances). Reviewed-by: imsnah
-
- Mar 05, 2009
-
-
Guido Trotter authored
Now it displays: --hypervisor-parameters hypervisor:hv-param=value [ ,hv-param=value ... ] --backend-parameters be-param=value [ ,be-param=value ... ] Sorry for the super-long lines :( Is there a better way to insert spaces without pushing them to the resulting man page? Reviewed-by: iustinp
-
- Mar 04, 2009
-
-
Iustin Pop authored
This patch makes the cfgupgrade script to handle: - instance changes - disk changes - further cluster fixes - adds configuration checks at the end, in non-dry-run mode Reviewed-by: ultrotter
-
Iustin Pop authored
This patch makes cfgupgrade work on empty cluster (i.e. no instances), up to a point that the config file can be converted from 1.2 to 2.0. This is not yet complete, though. Reviewed-by: ultrotter
-
Iustin Pop authored
“copyfile” takes a file argument, so we enable file-completion for it. “gnt-cluster command” takes a command, so we enable command completion. Reviewed-by: imsnah
-