Commits · 4a8b186acb4e26ee6ea0fa4246d708411e4a772a · itminedu / snf-ganeti

Oct 01, 2008

Move functions from ssconf.py elsewhere · 4a8b186a

Michael Hanselmann authored 16 years ago

These functions will be used to access config values instead of using
ssconf.

Reviewed-by: iustinp

4a8b186a

Add simple configuration reader/writer classes · 856c67e1

Michael Hanselmann authored 16 years ago

This will be used to read the configuration file in the node daemon.
The write functionality is needed for master failover.

Reviewed-by: iustinp

856c67e1

Fix the watcher with down nodes · 37b77b18

Iustin Pop authored 16 years ago

The watcher didn't handle the down nodes, fix this by ignoring (in
secondary node reboot checks) any node that doesn't return a boot id.

Reviewed-by: imsnah

37b77b18

Fix the watcher not restarting instance bug · b7309a0d

Iustin Pop authored 16 years ago

The watcher was using conflicting attributes of the instance:
  - it queried the admin_/oper_state, which are booleans
  - but it compared those to the status (which is a text field)

The code was changed to query the aggregated 'status' field, as that
will also return indication of node problems, and we can use this only
one field for all decisions. We still ask for the admin_state field as
that is needed for the activate disks check (in secondary node restart).

The patch also touches the watcher in some other parts:
  - log exceptions nicer
  - convert a method to @staticmethod
  - remove unused imports

Reviewed-by: imsnah

b7309a0d

Remove last use of utils.RunCmd from the watcher · 5188ab37

Iustin Pop authored 16 years ago

The watcher has one last use of ganeti commands as opposed to sending
requests via luxi. The patch changes this to use the cli functions.

The patch also has two other changes:
  - fix the docstring for OpVerifyDisks (found out while converting
    this)
  - enable stderr logging on the watcher when “-d” is passes

Reviewed-by: imsnah

5188ab37

Fix unittests broken by revision 1727 · 36b8c2c1
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
36b8c2c1

Add cluster options from ssconf to configuration · f6bd6e98

Michael Hanselmann authored 16 years ago

ssconf will become write-only from ganeti-masterd's point of view,
therefore all settings in there need to go into the main configuration
file.

Reviewed-by: iustinp

f6bd6e98

Move instantiation of config into bootstrap.py · b9eeeb02

Michael Hanselmann authored 16 years ago

Future patches will add even more variables to the cluster config.
Adding more parameters wouldn't make the function easier to use and
it doesn't make sense to pass them to another function, as it's
only done once in bootstrap.py on cluster initialization.

Reviewed-by: iustinp

b9eeeb02

Change the results from cli.PollJob · 53c04d04

Iustin Pop authored 16 years ago

Curently PollJob accepts a generic job, but will return (history
artifact) only the first opcode result. This is wrong, as it doesn't
allow polling of a job with multiple results.

Its only caller (for now) is also changed, so no functional changes
should happen.

Reviewed-by: ultrotter, amishchenko

53c04d04

Sep 30, 2008

Add list of build dependencies · b2fc7ea1
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
b2fc7ea1
Build HTML from RST input · f05c99f3
Michael Hanselmann authored 16 years ago
```
This patch also adds the design documents to Makefile.am.

Reviewed-by: iustinp
```
f05c99f3
Fix ‘gnt-job info’ with no arguments · b27b39b0
Iustin Pop authored 16 years ago
```
I didn't realize that my zip will break when no args are passed...

Reviewed-by: imsnah
```
b27b39b0

Add output of job/opcode timestamps · aad81f98

Iustin Pop authored 16 years ago

This patch adds posibility of selection of job/opcode timestamps in
gnt-job list and info.

The code handling the possible cases (None or a valid timestamps) are
ugly though...

Reviwed-by: imsnah

aad81f98

Enhance the job-related timestamps · c56ec146

Iustin Pop authored 16 years ago

This patch adds start, stop, and received timestamp for jobs (and allows
querying of them), and allows querying of the opcode timestamps.

Reviewed-by: imsnah

c56ec146

Small fixes for master daemon design document · efd0d44f

Michael Hanselmann authored 16 years ago

It said CLI/RAPI will talk to master using HTTP, which isn't true. Add
a reference to job queue design document. Small typos.

Reviewed-by: iustinp

efd0d44f

Import design doc for commandline arguments · b6c64863
Alexander Schreiber authored 16 years ago
```
Reviewed-by: iustinp
```
b6c64863
locking design: code path and declarations · 4e8d0685
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
4e8d0685

locking design: explain use of async mode · 6e4f6dfa

Guido Trotter authored 16 years ago

Before we were discussing this possible future feature, and its
drawbacks, but not its usefulness. This patch corrects this.

Reviewed-by: iustinp

6e4f6dfa

locking design: talk about removing locks · 164a5bcb
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
164a5bcb
Import (and update) granular locking design doc · 040408a3
Guido Trotter authored 16 years ago
```
Reviewed-by: iustinp
```
040408a3

Abstract the timestamp formatting into cli.py · 3386e7a9

Iustin Pop authored 16 years ago

Currently we format the timestamp inside the gnt-job info function. We
will need this more times in the future, so move it to cli.py as a
separate, exported function.

Reviewed-by: imsnah

3386e7a9

Sep 29, 2008

Add job queue design document · b2cee5e5
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
b2cee5e5

Add an 'index' of design documents · 84f4dc28

Iustin Pop authored 16 years ago

This will be an overview document, enumerating the changes without going
into details and pointing to the actual documents.

Reviewed-by: ultrotter

84f4dc28

Add opcode execution log in job info · 5b23c34c

Iustin Pop authored 16 years ago

This patch adds the job execution log in “gnt-job info” and also allows
its selection in “gnt-job list” (however here it's not very useful as
it's not easy to parse). It does this by adding a new field in the query
job call, named ‘oplog’.

With this, one can get a very clear examination of the job. What remains
to be added would be timestamps for start/stop of the processing for the
job itself and its opcodes.

Reviewed-by: imsnah

5b23c34c

Move a hardcoded constant to constants.py · 3c03759a

Iustin Pop authored 16 years ago

For now we only use the ‘C’ protocol so we can put it in constants.py
instead of hardcoding it.

Reviewed-by: imsnah

3c03759a

Enable the use of shared secrets · 2899d9de

Iustin Pop authored 16 years ago

This patch enables the use of the shared secrets for DRBD8 disks, using
(hardcoded in constants.py) the md5 digest algorithm.

For making this more flexible, either we implement a cluster parameter
(once the new model is in place), or we can make it ./configure-time
selectable.

Reviewed-by: imsnah

2899d9de

Extend DRBD disks with shared secret attribute · f9518d38

Iustin Pop authored 16 years ago

This patch, which is similar to r1679 (Extend DRBD disks with minors
attribute), extends the logical and physical id of the DRBD disks with a
shared secret attribute. This is generated at disk creation time and
saved in the config file.

The generation of the secret is done so that we don't have duplicates in
the configuration (otherwise the goal of preventing cross-connection
will not be reached), so we add to config.py more than just a simple
call to utils.GenerateSecret().

The patch does not yet enable the use of the secrets.

Reviewed-by: imsnah

f9518d38

Add a info subcommand to gnt-job · 191712c0

Iustin Pop authored 16 years ago

Currently, it is hard to examine a job in detail; the output of ‘gnt-job
list’ is not easy to parse.

The patch adds a ‘gnt-job info’ command that is (vaguely) similar to
‘gnt-instance info’ in that it shows in a somewhat easy to understand
format the details of a job.

The result formatter is the most complicated part, since the results are
not standardized; the code attempts to format nicely the most common
result types (as taken from a random job list), via a generic algorithm.

Reviewed-by: imsnah

191712c0

Implement job summary in gnt-job list · 60dd1473

Iustin Pop authored 16 years ago

It is not currently possibly to show a summary of the job in the output
of “gnt-job list”. The closes is listing the whole opcode(s), but that
is too verbose. Also, the default output (id, status) is not very
useful, unless one looks for (and knows about) an exact job ID.

The patch adds a “summary” description of a job composed of the list of
OP_ID of the individual opcodes. Moreover, if an opcode has a ‘logical’
target in a certain opcode field (e.g. start instance has the instance
name as the target), then it is included in the formatting also. It's
easier to explain via a sample output:

gnt-job list
ID Status  Summary
1  error   NODE_QUERY
2  success NODE_ADD(gnta2)
3  success CLUSTER_QUERY
4  success NODE_REMOVE(gnta2.example.com)
5  error   NODE_QUERY
6  success NODE_ADD(gnta2)
7  success NODE_QUERY
8  success OS_DIAGNOSE
9  success INSTANCE_CREATE(instance1.example.com)
10 success INSTANCE_REMOVE(instance1.example.com)
11 error   INSTANCE_CREATE(instance1.example.com)
12 success INSTANCE_CREATE(instance1.example.com)
13 success INSTANCE_SHUTDOWN(instance1.example.com)
14 success INSTANCE_ACTIVATE_DISKS(instance1.example.com)
15 error   INSTANCE_CREATE(instance2.example.com)
16 error   INSTANCE_CREATE(instance2.example.com)
17 success INSTANCE_CREATE(instance2.example.com)
18 success INSTANCE_ACTIVATE_DISKS(instance1.example.com)
19 success INSTANCE_ACTIVATE_DISKS(instance2.example.com)
20 success INSTANCE_SHUTDOWN(instance1.example.com)
21 success INSTANCE_SHUTDOWN(instance2.example.com)

This is done by a simple change to the opcode classes, which allows an
opcode to format itself. The additional function is small enough that it
can go in opcodes.py, where it could also be used by a client if needed.

Reviewed-by: imsnah

60dd1473

Nicely sort the job list · 3b87986e

Iustin Pop authored 16 years ago

Unless we decide to change the job identifiers to integer, we should at
least sort the list returned by _GetJobIDsUnlocked.

Reviewed-by: imsnah

3b87986e

Sep 28, 2008

Move the pseudo-secret generation to utils.py · 33081d90

Iustin Pop authored 16 years ago

The bootstrap code needs a pseudo-secret and this is currently generated
inside the InitGanetiServerSetup function. Since more users will need
this, move it to utils.py

Reviewed-by: ultrotter

33081d90

Fix a bug related to static minors · d48663e4

Iustin Pop authored 16 years ago

When the node does not yet have any minors allocated, the first minor
(0) will not be entered in the ConfigWriter._temporary_drbds structure.
This does not happen for our current usage, since we always ask for two
minors (so the next call will not match this case), but it will be
triggered if we only ask for one minor, and then ask again before adding
the instance to the config file.

Reviewed-by: ultrotter

d48663e4

Sep 27, 2008

Add checks for tcp/udp port collisions · 48ce9fd9

Iustin Pop authored 16 years ago

In case the config file is manually modified, or in case of bugs, the
tcp/udp ports could be reused, which will create various problems
(instances not able to start, or drbd disks not able to communicate).

This patch extends the ConfigWriter.VerifyConfig() method (which is used
in cluster verify) to check for duplicates between:
  - the ports used for DRBD disks
  - the ports used for network console
  - the ports marked as free in the config file

Also, if the cluster parameter ‘highest_used_port’ is actually lower
than the computed highest used port, this is also flagged as an error.

The output from gnt-cluster verify will show (output manually wrapped):

node1 # gnt-cluster verify
* Verifying global settings
  - ERROR: tcp/udp port 11006 has duplicates: instance3.example.com/network port,
instance2.example.com/drbd disk sda
  - ERROR: tcp/udp port 11017 has duplicates: instance3.example.com/drbd disk sda,
instance3.example.com/drbd disk sdb, cluster/port marked as free
  - ERROR: Highest used port mismatch, saved 11010, computed 11017
* Gathering data (2 nodes)
...

Reviewed-by: ultrotter

48ce9fd9

Update the cluster serial_no on certain operations · b9f72b4e

Iustin Pop authored 16 years ago

This patch adds update of the cluster serial number for:
  - add/remove node (as the cluster's node list is changed)
  - add/remove/rename instance (as the cluster's instance list is changed)
  - change the volume group name

The rule for updating this attribute is when cluster-wide properties are
changed, but not individual node/instance ones.

There are other remaining cases to handle, pending on the ssconf
changes.

Reviewed-by: ultrotter

b9f72b4e

Allow listing of the serial_no via gnt-* list · 38d7239a

Iustin Pop authored 16 years ago

This patch adds listing of the serial_no attribute in gnt-instance and
gnt-node list, and updates to the manpages to reflect the change.

Reviewed-by: ultrotter

38d7239a

Initialize and update the serial_no on objects · b989e85d

Iustin Pop authored 16 years ago

This patch add initialization of the serial_no on instance and nodes,
and update of the field whenever an object is updated in the generic
case, via ConfigWriter.Update(obj) and in the specific case of
instances' state being modified manually.

Reviewed-by: ultrotter

b989e85d

Switch the global serial_no to the top object · 9d38c6e1

Iustin Pop authored 16 years ago

Currently the serial_no that is incremented every time the configuration
file is written is located on the 'cluster' object in the configuration
structure. However, this is wrong as the cluster serial_no should be
incremented only when the cluster state is changed (for whatever
definition of “changed” we will use), not simply because the
configuration file is written.

This patch changes so that the ConfigWriter._BumpSerialNo affects the
top-level ConfigData object.

Reviewed-by: ultrotter

9d38c6e1

Add serial_no attributes to objects · be1fa613

Iustin Pop authored 16 years ago

This patch adds the ‘serial_no’ attribute to the other top-level objects
(the configuration object itself, the nodes and the instances).

Reviewed-by: ultrotter

be1fa613

Replace a cfg.AddInstance with UpdateInstance · 97abc79f

Iustin Pop authored 16 years ago

This seems to be the last (deprecated) use of AddInstance in order to
update an instance.

The patch also removes a whitespace-at-eol case.

Reviewed-by: ultrotter

97abc79f

Add design doc for the disk changes · fbd6f863
Iustin Pop authored 16 years ago
```
Reviewed-by: imsnah
```
fbd6f863