Commits · 00cd937c3b2428494d6a612e0614aad2e92262e9 · itminedu / snf-ganeti

Oct 08, 2008

Sanitize the hypervisor names · 00cd937c

Iustin Pop authored 16 years ago

Since in 2.0 the user will possibly have more interaction with the
hypervisor names, we sanitize them by removing the version numbers
(the version can be a prerequisite for the ganeti installation, we
shouldn't document it in variable names).

Reviewed-by: schreiberal

00cd937c

Fix for gnt-cluster init. · 02f99608
Oleksiy Mishchenko authored 16 years ago
```
Reviewed-by: iustinp
```
02f99608

Move the hypervisor attribute to the instances · e69d05fd

Iustin Pop authored 16 years ago

This (big) patch moves the hypervisor type from the cluster to the
instance level; the cluster attribute remains as the default hypervisor,
and will be renamed accordingly in a next patch. The cluster also gains
the ‘enable_hypervisors’ attribute, and instances can be created with
any of the enabled ones (no provision yet for changing that attribute).

The many many changes in the rpc/backend layer are due to the fact that
all backend code read the hypervisor from the local copy of the config,
and now we have to send it (either in the instance object, or as a
separate parameter) for each function.

The node list by default will list the node free/total memory for the
default hypervisor, a new flag to it should exist to select another
hypervisor. Instance list has a new field, hypervisor, that shows the
instance hypervisor. Cluster verify runs for all enabled hypervisor
types.

The new FIXMEs are related to IAllocator, since now the node
total/free/used memory counts are wrong (we can't reliably compute the
free memory).

Reviewed-by: imsnah

e69d05fd

Oct 07, 2008

rpc.call_instance_migrate: pass the whole instance · 9f0e6b37

Iustin Pop authored 16 years ago

Currently the call_instance_migrate call only passes the instance name;
we need to pass the whole object for the hypervisor_type changes (all
the other individual instance rpc calls already pass the instance
object).

Reviewed-by: imsnah

9f0e6b37

Implement job 'waiting' status · e92376d7

Iustin Pop authored 16 years ago

Background: when we have multiple jobs in the queue (more than just a
few), many of the jobs (up to the number of threads) will be in state
'running', although many of them could be actually blocked, waiting for
some locks. This is not good, as one cannot easily see what is
happening.

The patch extends the opcode/job possible statuses with another one,
waiting, which shows that the LU is in the acquire locks phase. The
mechanism for doing so is simple, we initialize (in the job queue) the
opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
control to the LU's Exec, it will call a notifier back into the
_JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
the proper queue locking). Because this mechanism does not save the job,
all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
so we also change the load sequence to consider WAITLOCK as RUNNING.

With the patch applied, creating in parallel (via burnin) five instances
on a five node cluster shows that only two are executing, while three
are waiting for locks.

Reviewed-by: imsnah

e92376d7

Oct 06, 2008

Implement job auto-archiving · 07cd723a

Iustin Pop authored 16 years ago

This patch adds a new luxi call that implements auto-archiving of jobs
older than a certain age (or -1 for all completed jobs), and the gnt-job
command that makes use of this (with 'all' for -1).

Reviewed-by: imsnah

07cd723a

Add a simple timespec parsing function · 2241e2b9

Iustin Pop authored 16 years ago

This function will be used for auto-archiving jobs via the command line.
The function is pretty simple, we only support up to weeks since months
and higher are not 'precise' entities, and dealing with them would
require us to start using calendar functions.

Reviewed-by: imsnah

2241e2b9

backend.py change to get cluster name from master · 62c9ec92

Iustin Pop authored 16 years ago

Currently there are three function in backend that need the cluster name
in order to instantiate an SshRunner. The patch changes these to get the
cluster name from the master in the rpc call; once the multi-hypervisor
change is implemented, then very few places in which we need the SCR
remain in the backend.

Reviewed-by: killerfoxi, imsnah

62c9ec92

Disable re-reading of config file · 3d3a04bc

Iustin Pop authored 16 years ago

Since the objects read from the config file are passed to the various
threads, it's unsafe to re-read the config file (and throw away
ConfigWriter._config_data). As such, we disable the re-reading of the
file (since now the master is the owner the file, it makes not sense to
re-read it), and any modifications to the file must be done offline,
otherwise they will be overwritten.

Reviewed-by: imsnah

3d3a04bc

Fix gnt-job list with empty timestamps · e0ec0ff6

Iustin Pop authored 16 years ago

In case the job object doesn't have a timestamp (which is a separate
issue), the listing should not break. We fix this by changing the
FormatTimstamp function itself to return '?' in case the timestamp
doesn't look good (note that it still can break if non-integers are
returned, but this is unlikely).

Reviewed-by: imsnah

e0ec0ff6

Increase the number of threads to 25 · 1daae384

Iustin Pop authored 16 years ago

Since our locks are not gathered nicely, we can have jobs that are
actually blocking on locks (parallel burnin shows this), so at least we
need to increase the number of threads above the usual number of jobs we
could have in a such a case.

Reviewed-by: imsnah

1daae384

Fix SshRunner breakage from the changed API · 6b0469d2

Iustin Pop authored 16 years ago

More places actually use the SshRunner than just the gnt-cluster
commands.

Reviewed-by: ultrotter

6b0469d2

Change SshRunner usage · 56bece1f

Iustin Pop authored 16 years ago

Currently the SshRunner uses a SimpleConfigReader instance, however this
is not best. We change it to use the cluster name directly (and its
constructor now takes this as parameter, instead of SCR), and its
callers are change to pass the name directly.

As a consequence, we can now remove the initialization of SCR in
gnt-cluster (copyfile and command), and instead we query the master for
the cluster name).

Reviewed-by: imsnah

56bece1f

Oct 05, 2008
- Fix ssconf.GetMasterAndMyself · 06dc5b44
  Iustin Pop authored 16 years ago
```
The ssconf migration left this out.

Reviwed-by: imsnah,ultrotter
```
  06dc5b44
Oct 01, 2008

Get rid of ssconf · c259ce64
Michael Hanselmann authored 16 years ago
```
Remove leftovers from ssconf.

Reviewed-by: iustinp
```
c259ce64
Don't pass sstore to LUs anymore · 0b38cf6e
Michael Hanselmann authored 16 years ago
```
sstore is no longer used in LUs.

Reviewed-by: iustinp
```
0b38cf6e
Convert bootstrap.py · d23ef431
Michael Hanselmann authored 16 years ago
```
Replace ssconf with configuration.

Reviewed-by: iustinp
```
d23ef431

Convert cmdlib.py · d6a02168

Michael Hanselmann authored 16 years ago

Replacing ssconf with configuration. Cluster rename is broken and stays
that way.

Reviewed-by: iustinp

d6a02168

Convert ssh.py · 7688d0d3

Michael Hanselmann authored 16 years ago

Get rid of ssconf and convert to configuration instead.

Reviewed-by: iustinp

7688d0d3

Convert rpc.py · eb1328a9
Michael Hanselmann authored 16 years ago
```
Replacing ssconf with utility functions.

Reviewed-by: iustinp
```
eb1328a9
Convert hypervisor · 3707f851
Michael Hanselmann authored 16 years ago
```
Replacing ssconf with configuration.

Reviewed-by: iustinp
```
3707f851
Convert mcpu.py · 437138c9
Michael Hanselmann authored 16 years ago
```
Replacing ssconf with configuration.

Reviewed-by: iustinp
```
437138c9

Convert config.py · 5b263ed7

Michael Hanselmann authored 16 years ago

The configuration version is now again in the configuration file.

Reviewed-by: iustinp

5b263ed7

Convert backend.py · c657dcc9
Michael Hanselmann authored 16 years ago
```
Replacing ssconf with simpleconfig.

Reviewed-by: iustinp
```
c657dcc9

Add new query to get cluster config values · ae5849b5

Michael Hanselmann authored 16 years ago

This can be used to retrieve certain cluster config values from
within clients.

OpDumpClusterConfig was not used anywhere, hence I'm just reusing
it. The way ConfigWriter.DumpConfig returned the configuration
was not thread-safe, anyway (no deepcopy).

Reviewed-by: iustinp

ae5849b5

Move functions from ssconf.py elsewhere · 4a8b186a

Michael Hanselmann authored 16 years ago

These functions will be used to access config values instead of using
ssconf.

Reviewed-by: iustinp

4a8b186a

Add simple configuration reader/writer classes · 856c67e1

Michael Hanselmann authored 16 years ago

This will be used to read the configuration file in the node daemon.
The write functionality is needed for master failover.

Reviewed-by: iustinp

856c67e1

Remove last use of utils.RunCmd from the watcher · 5188ab37

Iustin Pop authored 16 years ago

The watcher has one last use of ganeti commands as opposed to sending
requests via luxi. The patch changes this to use the cli functions.

The patch also has two other changes:
  - fix the docstring for OpVerifyDisks (found out while converting
    this)
  - enable stderr logging on the watcher when “-d” is passes

Reviewed-by: imsnah

5188ab37

Add cluster options from ssconf to configuration · f6bd6e98

Michael Hanselmann authored 16 years ago

ssconf will become write-only from ganeti-masterd's point of view,
therefore all settings in there need to go into the main configuration
file.

Reviewed-by: iustinp

f6bd6e98

Move instantiation of config into bootstrap.py · b9eeeb02

Michael Hanselmann authored 16 years ago

Future patches will add even more variables to the cluster config.
Adding more parameters wouldn't make the function easier to use and
it doesn't make sense to pass them to another function, as it's
only done once in bootstrap.py on cluster initialization.

Reviewed-by: iustinp

b9eeeb02

Change the results from cli.PollJob · 53c04d04

Iustin Pop authored 16 years ago

Curently PollJob accepts a generic job, but will return (history
artifact) only the first opcode result. This is wrong, as it doesn't
allow polling of a job with multiple results.

Its only caller (for now) is also changed, so no functional changes
should happen.

Reviewed-by: ultrotter, amishchenko

53c04d04

Sep 30, 2008

Enhance the job-related timestamps · c56ec146

Iustin Pop authored 16 years ago

This patch adds start, stop, and received timestamp for jobs (and allows
querying of them), and allows querying of the opcode timestamps.

Reviewed-by: imsnah

c56ec146

Abstract the timestamp formatting into cli.py · 3386e7a9

Iustin Pop authored 16 years ago

Currently we format the timestamp inside the gnt-job info function. We
will need this more times in the future, so move it to cli.py as a
separate, exported function.

Reviewed-by: imsnah

3386e7a9

Sep 29, 2008

Add opcode execution log in job info · 5b23c34c

Iustin Pop authored 16 years ago

This patch adds the job execution log in “gnt-job info” and also allows
its selection in “gnt-job list” (however here it's not very useful as
it's not easy to parse). It does this by adding a new field in the query
job call, named ‘oplog’.

With this, one can get a very clear examination of the job. What remains
to be added would be timestamps for start/stop of the processing for the
job itself and its opcodes.

Reviewed-by: imsnah

5b23c34c

Move a hardcoded constant to constants.py · 3c03759a

Iustin Pop authored 16 years ago

For now we only use the ‘C’ protocol so we can put it in constants.py
instead of hardcoding it.

Reviewed-by: imsnah

3c03759a

Enable the use of shared secrets · 2899d9de

Iustin Pop authored 16 years ago

This patch enables the use of the shared secrets for DRBD8 disks, using
(hardcoded in constants.py) the md5 digest algorithm.

For making this more flexible, either we implement a cluster parameter
(once the new model is in place), or we can make it ./configure-time
selectable.

Reviewed-by: imsnah

2899d9de

Extend DRBD disks with shared secret attribute · f9518d38

Iustin Pop authored 16 years ago

This patch, which is similar to r1679 (Extend DRBD disks with minors
attribute), extends the logical and physical id of the DRBD disks with a
shared secret attribute. This is generated at disk creation time and
saved in the config file.

The generation of the secret is done so that we don't have duplicates in
the configuration (otherwise the goal of preventing cross-connection
will not be reached), so we add to config.py more than just a simple
call to utils.GenerateSecret().

The patch does not yet enable the use of the secrets.

Reviewed-by: imsnah

f9518d38

Implement job summary in gnt-job list · 60dd1473

Iustin Pop authored 16 years ago

It is not currently possibly to show a summary of the job in the output
of “gnt-job list”. The closes is listing the whole opcode(s), but that
is too verbose. Also, the default output (id, status) is not very
useful, unless one looks for (and knows about) an exact job ID.

The patch adds a “summary” description of a job composed of the list of
OP_ID of the individual opcodes. Moreover, if an opcode has a ‘logical’
target in a certain opcode field (e.g. start instance has the instance
name as the target), then it is included in the formatting also. It's
easier to explain via a sample output:

gnt-job list
ID Status  Summary
1  error   NODE_QUERY
2  success NODE_ADD(gnta2)
3  success CLUSTER_QUERY
4  success NODE_REMOVE(gnta2.example.com)
5  error   NODE_QUERY
6  success NODE_ADD(gnta2)
7  success NODE_QUERY
8  success OS_DIAGNOSE
9  success INSTANCE_CREATE(instance1.example.com)
10 success INSTANCE_REMOVE(instance1.example.com)
11 error   INSTANCE_CREATE(instance1.example.com)
12 success INSTANCE_CREATE(instance1.example.com)
13 success INSTANCE_SHUTDOWN(instance1.example.com)
14 success INSTANCE_ACTIVATE_DISKS(instance1.example.com)
15 error   INSTANCE_CREATE(instance2.example.com)
16 error   INSTANCE_CREATE(instance2.example.com)
17 success INSTANCE_CREATE(instance2.example.com)
18 success INSTANCE_ACTIVATE_DISKS(instance1.example.com)
19 success INSTANCE_ACTIVATE_DISKS(instance2.example.com)
20 success INSTANCE_SHUTDOWN(instance1.example.com)
21 success INSTANCE_SHUTDOWN(instance2.example.com)

This is done by a simple change to the opcode classes, which allows an
opcode to format itself. The additional function is small enough that it
can go in opcodes.py, where it could also be used by a client if needed.

Reviewed-by: imsnah

60dd1473

Nicely sort the job list · 3b87986e

Iustin Pop authored 16 years ago

Unless we decide to change the job identifiers to integer, we should at
least sort the list returned by _GetJobIDsUnlocked.

Reviewed-by: imsnah

3b87986e

Sep 28, 2008

Move the pseudo-secret generation to utils.py · 33081d90

Iustin Pop authored 16 years ago

The bootstrap code needs a pseudo-secret and this is currently generated
inside the InitGanetiServerSetup function. Since more users will need
this, move it to utils.py

Reviewed-by: ultrotter

33081d90