Commits · 8eb148ae5f1189e390ea7bb1ff50de07bd8afc10 · itminedu / snf-ganeti

May 04, 2009

Fix gnt-cluster getmaster on non-master nodes · 8eb148ae

Iustin Pop authored 16 years ago


The current implementation of “gnt-cluster getmaster” doesn't work on
non-master nodes, which is a regression from 1.2. This patch implements
it (again) via ssconf.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Alexander Schreiber <als@google.com>

8eb148ae

Apr 27, 2009
- Release 2.0rc4 · d1908b41
  Iustin Pop authored 16 years ago
```
Reviewed-by: ultrotter
```
  v2.0.0rc4
  
  d1908b41
Apr 24, 2009

Update gnt-instance(8) for info · d09ebf6f

Guido Trotter authored 16 years ago

Add the --all argument, and reword a bit the basic information.

Reviewed-by: iustinp

d09ebf6f

gnt-instance info --all · 220cde0b

Guido Trotter authored 16 years ago

Don't show all instances info by default, but require --all to be passed
for this time consuming operation.

Reviewed-by: iustinp

220cde0b

LUDiagnoseOS: change locking and error handling · a6ab004b

Iustin Pop authored 16 years ago

Since the “list OSes” call is exported via RAPI, this can be used pretty
easily to DOS the master daemon during long jobs.

The implementation of LUDiagnoseOS makes an RPC call to all nodes; we
lock nodes here in order to prevent node removal.

However, after closer examination, the worst case is:
  - we get the list of nodes from the config
  - another thread removes a node
  - our RPC queries reach the removed node

As this point, if ganeti-noded is stopped or doesn't accept our queries,
the RPC call will return failed, and in the current implementation all
OSes will become invalid.

If we change the ‘failed RPC’ handling to ignore such nodes, this allows
us to both remove locking, and to handle transient RPC failures better
(not invalidating all OSes).

This patch does both these things, with a single drawback: in gnt-os
diagnose, the down nodes do not appear at all. I think this is a small
drawback, and the alternative is to add them with status failed; this
works (3-line patch), but then the output of “list” and “diagnose” will
no longer be consistent. As such, my proposal is to not list the nodes.

Reviewed-by: ultrotter

a6ab004b

Fix verify-disks with broken volume groups · ea9ddc07

Iustin Pop authored 16 years ago

When a remote node returns invalid LVM data, we check it, but we don't
stop and continue with the rest of the checks (which require a valid
volume group). This raises an internal error and breaks verify disks.

This seems unchanged for a long while, I don't know why it surfaced just
recently.

Reviewed-by: ultrotter

ea9ddc07

Prevent errors when xenvg is broken cluster verify · 9a198532

Iustin Pop authored 16 years ago

When vg_name is not returned at all, we currently abort with an internal
error. This is because we don't catch KeyError.

This patch adds a custom message for this case, and also adds KeyError
to the list of catched exceptions, just for safety.

On the other hand, we could also just remove this piece of code since
it's not used at all the ["dfree"] value.

Reviewed-by: ultrotter

9a198532

Apr 15, 2009

A bunch of doc and other small fixes · 949bdabe

Iustin Pop authored 16 years ago

This patch adds a couple of both externally and internally reported
issues:
  - missing SGML tags (Issue 54), report and patch by superdupont
  - wrong variable used in the init.d script, report and patch by
    Karsten Keil <karsten-keil@t-online.de>
  - man page for gnt-instance reinstall needs clarification (Issue 56)
  - gnt-instance man page missing --disks documentation for
    replace-disks
  - gnt-node modify help output is unclear about the -C/-D/-O input
    format, and the man page doesn't document this command at all
  - “gnt-node modify -C yes” for offline or drained nodes had wrong
    error message
  - “gnt-instance reinstall --select-os” has wrong prompt, we only
    accept a number for the OS and not the template name

Reviewed-by: ultrotter

949bdabe

Apr 14, 2009
- Trivial typo fix in error message · 8c7aaa72
  Alexander Schreiber authored 16 years ago
```
Reviewed-by: iustinp
```
  8c7aaa72
Apr 08, 2009
- Release 2.0rc3 · 5bbefdec
  Iustin Pop authored 16 years ago
```
Burnin tests were successful, release rc3.

Reviewed-by: imsnah
```
  v2.0.0rc3
  
  5bbefdec
Apr 07, 2009

Distribute built documentation · 2ab2b9f5

Iustin Pop authored 16 years ago

This patch changes the way documentation is built in order to distribute
the generated output in the 'dist' archive, and thus no longer
requiring the presence of the docbook/rst toolchains during build time.
This will lower the requirements for installation and also makes the
build time insignificant.

First, we remove the docbook2pdf rules and variables, since we no longer
build this kind of docs. Furthermore, the rst source files are not
(today) processed via replace_vars_sed, so the whole .in rules for doc/
go away.

Next, we change the ".sgml|.rst -> replace_vars_sed -> .in -> processor
-> final file" processing to ".sgml|.rst -> generator -> .in ->
replace_vars_sed -> final file"; this means we first process the file
using the formatter, with the @VARIABLE@ entries in it, and save the
output as .in; this output we distribute, and on the user side, the
replace_vars_sed will use the new configure flags to transform the
(almost final .in form) to the final form, without needing the
toolchain.

In configure.ac we also change from ERROR to WARN for the documentation
generators, and extra tests in Makefile.am check that the programs have
been found.

This was tested with distcheck and works as expected.

Reviewed-by: ultrotter

2ab2b9f5

Apr 06, 2009

Disable synchronous (locking) queries · 77921a95

Iustin Pop authored 16 years ago

This patch raises an error in the master daemon in case the user
requests a locking query; accordingly, all clients were modified to send
only lockless queries. This is short-term fix, for proper fix the
clients should be modified to submit a job when the user request a
locking query.

The other approach would be to ignore the flag passed by the client;
this would be worse as client's wouldn't get at least an error.

The possible impact of this is multiple:
  - some commands could have been not converted, and thus fail; this
    can be remedied easily
  - the consistency of commands is lost; e.g. node failover will not
    lock the node *while we get the node info*, so we could miss some
    data; this is again in the thread of atomic operations which are
    missing in the current model of query-and-act from gnt-* scripts

Reviewed-by: imsnah, ultrotter

77921a95

Fix the output of watcher on non-master nodes · 2c404217

Iustin Pop authored 16 years ago

Currently the watcher spews errors message on non-master nodes. This
cleans it up.

Reviewed-by: imsnah

2c404217

Change the watcher to use jobs instead of queries · 6dfcc47b

Iustin Pop authored 16 years ago

As per the mailing list discussion, this patch changes the watcher to
use a single job (two opcodes) for getting the cluster state (node list
and instance list); it will then compute the needed actions based on
this data.

The patch also archives this job and the verify-disks job.

Reviewed-by: imsnah

6dfcc47b

Fix Xen soft reboot via polling · 7dd106d3

Iustin Pop authored 16 years ago

This patch fixes the Xen soft reboot ("xm reboot") via polling for a specific
time for either changed domain ID or decreased CPU run-time.

This sould prevent the race-conditions discussed on the mailing list for
reboots.

Reviewed-by: imsnah

7dd106d3

Add a new ssconf file with the cluster tags · 5d60b3bd

Iustin Pop authored 16 years ago

Since the cluster tags are/should be more-or-less static, add them as an
ssconf key, so that querying them is possible without creating a
job/requiring the masterd to be running.

Reviewed-by: imsnah

5d60b3bd

Add some more debugging info to masterd · e566ddbd

Iustin Pop authored 16 years ago

This patch will log data about queries, which are today completely
invisible (at the default log level) in the master log file.

Reviewed-by: imsnah

e566ddbd

Mar 27, 2009

Release 2.0rc2 · f06d91f2

Iustin Pop authored 16 years ago

This updates the NEWS file and bumps up the version number.

Reviewed-by: ultrotter

f06d91f2

Mar 20, 2009

Fix _NOQUOTE regexp · 8a088b79

Guido Trotter authored 16 years ago

Allow expressions longer than one character to match.

Reviewed-by: imsnah

8a088b79

Mainloop: avoid calculating timeout every time · 53d47a06
Guido Trotter authored 16 years ago
```
set timeout_needs_update to False after calculating the timeout.

Reviewed-by: imsnah
```
53d47a06

Raise on invalid gnt-cluster queue commands · 2e668b38

Guido Trotter authored 16 years ago

 # gnt-cluster queue foo
 Failure: prerequisites not met for this operation:
 Command 'foo' is not valid.

Reviewed-by: iustinp

2e668b38

Mar 12, 2009

kvm: use the correct vnc bind address · 19498d6c

Guido Trotter authored 16 years ago

There is a bug in kvm, when binding vnc to a specific address the
constant 'vnc_bind_address' is passed in, instead of the actual
requested address. This patch fixes it.

Reviewed-by: iustinp

19498d6c

Add the 2.0-specific node flags to the design doc · e0eb13de

Iustin Pop authored 16 years ago

This patch adds the newly-introduced node flags to the design document,
as they currently are missing from there.

The patch also reduces the TOC depth to 3, as it was too big.

Reviewed-by: ultrotter

e0eb13de

Fix the --net option to gnt-instance add · dc30b0e4

Iustin Pop authored 16 years ago

Similar to the --disk fixes a while ago, --net is broken too. This patch
fixes it.

Reviewed-by: imsnah

dc30b0e4

Mar 10, 2009
- Xen: Remove one hardcoded constant · 6b405598
  Guido Trotter authored 16 years ago
```
s/"vnc_bind_address"/constants.HV_VNC_BIND_ADDRESS/

Reviewed-by: imsnah
```
  6b405598
Mar 09, 2009

watcher: fix startup sequence locking the master · cc962d58

Iustin Pop authored 16 years ago

Currently, the watcher startup sequence does:
  - open a luxi client
  - get the instance list
  - get the node boot ids
  - open and lock the status file, and:
    - archive jobs
    - restart the down instances
    - check disks

This, of course, can lead to problems when a node is (genuinely or not)
locked for more than (watcher interval * maximum query clients) time. At
that time, the master is completely unresponsive until the node is
unlocked and all the watchers exit with error due to the state file
being locked by the first instance.

This patch reworks the startup sequence to first open/lock the status
file, and only then open a luxi client. This should prevent the above
case.

Reviewed-by: ultrotter

cc962d58

Handle ghost instances in temp DRBD map · c614e5fb

Iustin Pop authored 16 years ago

Currently cluster-verify doesn't handle the (admitedly invalid) case where we
have reservation for instances that were removed in the meantime.

This patch adds a check for this and prevents code errors in cluster-verify in
this case:
 * Verifying node node4.example.com (master candidate)
   - ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map

Reviewed-by: imsnah

c614e5fb

Fix error handling in replace-disks with new node · 82759cb1

Iustin Pop authored 16 years ago

Currently the _CreateSingleBlockDev function only raises OpExecError and not
BlockDeviceError. This means that we don't release the instance's temporary
minors properly, and this creates problems later if the instance is removed
without master restart.

We could just use OpExecError, but adding it and leaving
BlockDeviceError in seems safer.

Reviewed-by: imsnah

82759cb1

Mar 06, 2009

Fix serial_no field on instances · 6f285030

Iustin Pop authored 16 years ago

The instance objects did not get a serial_no field. This patch adds a
new constants for the field name and uses it for all three cases
(cluster, nodes, instances).

Reviewed-by: imsnah

6f285030

Mar 05, 2009

Update gnt-cluster(8) for be/hyp parameter syntax · 555918b3

Guido Trotter authored 16 years ago

Now it displays:

--hypervisor-parameters hypervisor:hv-param=value [ ,hv-param=value ... ]
--backend-parameters be-param=value [ ,be-param=value ... ]

Sorry for the super-long lines :( Is there a better way to insert spaces
without pushing them to the resulting man page?

Reviewed-by: iustinp

555918b3

Mar 04, 2009

Complete the cfgupgrade script for 2.0 migrations · ac4d25b6

Iustin Pop authored 16 years ago

This patch makes the cfgupgrade script to handle:
  - instance changes
  - disk changes
  - further cluster fixes
  - adds configuration checks at the end, in non-dry-run mode

Reviewed-by: ultrotter

ac4d25b6

First run at cfgupgrade for 2.0 upgrades · a421fdeb

Iustin Pop authored 16 years ago

This patch makes cfgupgrade work on empty cluster (i.e. no instances),
up to a point that the config file can be converted from 1.2 to 2.0.
This is not yet complete, though.

Reviewed-by: ultrotter

a421fdeb

Fix bash completion for cluster copyfile/command · 75615bd3

Iustin Pop authored 16 years ago

“copyfile” takes a file argument, so we enable file-completion for it.
“gnt-cluster command” takes a command, so we enable command completion.

Reviewed-by: imsnah

75615bd3

Mar 02, 2009

Release 2.0rc1 · a2370b24

Iustin Pop authored 16 years ago

This patch updates the NEWS file and increases the version to 2.0 rc1.

Reviewed-by: ultrotter

a2370b24

Export tags to cluster verify hooks · 35e994e9

Iustin Pop authored 16 years ago

This patch export the cluster and node tags to the cluster verify hook
scripts. The tags are exported as a space-separated list, which allows
easy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do
...”) and therefore requires the previous “Don't allow spaces in tag
names” patch.

The patch also fixes a minor line length style problem.

Reviewed-by: ultrotter

35e994e9

Don't allow spaces in tag names · 28ab6fed

Iustin Pop authored 16 years ago

This patch restricts the use of spaces in tags, as this does not allow
nice exporting of tags to environment in hooks. One can use underscores
or dashes instead of spaces.

Reviewed-by: schreiberal

28ab6fed

Update the iallocator documentation · 77031881

Iustin Pop authored 16 years ago

This updates the iallocator documentation to 2.0, bumps up the
iallocator version (and moves a constants to lib/constants.py), and
fixes a style on install.rst.

Reviewed-by: ultrotter

77031881

Fix a bug in utils.EnsureDirs · 1b2c8f85

Iustin Pop authored 16 years ago

This fixes a bug introduced in rev 2562 and also fixes the indentation.

Reviewed-by: ultrotter

1b2c8f85

A doc update and a small indentation fix · b806661b

Iustin Pop authored 16 years ago

This adds a small paragraph about the “master” role of a node, and fixes
a wrong indentation in the bash completion file.

Reviewed-by: imsnah

b806661b

Feb 27, 2009

Use EnsureDirs in KVM as well. · 9afb67fe

Guido Trotter authored 16 years ago

The KVM hypervisor has also code to ensure a list of directories exist.
Substitute it with our new utils function.

Reviewed-by: iustinp

9afb67fe