Commits · 16f323ce3a60679ffc933ef462396a108446b18d · itminedu / snf-ganeti

Dec 05, 2008

Cleanup the config file on demotion from candidate · 56aa9fd5

Iustin Pop authored 16 years ago

This patch adds a simple rpc which makes a backup of the config file and
then removes it. This is done so that cluster verify doesn't complain
immediately after demoting a node.

Reviewed-by: imsnah

56aa9fd5

watcher: handle offline nodes better · cbfc4681

Iustin Pop authored 16 years ago

This patch changes the LUQueryInstances to show a different state for
offline nodes and also modifies the watcher to understand the offline
state in its checks.

Reviewed-by: ultrotter

cbfc4681

Dec 04, 2008
- ganeti-rapi: Convert to new HTTP server · bc2929fc
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: amishchenko
```
  bc2929fc
- ganeti-noded: Migrate to new HTTP server · 19205c39
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: amishchenko
```
  19205c39
- Rename all HTTP classes to camel case · 84f2756e
  Michael Hanselmann authored 16 years ago
```
It should be consistent.

Reviewed-by: amishchenko
```
  84f2756e
Dec 02, 2008

Fix master failover · bbe19c17

Iustin Pop authored 16 years ago

The ssconf files were not updated by the master failover. We need to
push them, and since we already have RPC initialized, we can use the
standard ConfigWriter to do so - this will take care of both the config
file and the ssconf files.

Reviewed-by: imsnah

bbe19c17

Nov 26, 2008

ganeti-masterd: create RUN_GANETI_DIR as well · 1cb8d376

Guido Trotter authored 16 years ago

Since we're not sure ganeti-noded has started yet, we need to create
RUN_GANETI_DIR before SOCKET_DIR as well, with the proper permissions.

Reviewed-by: imsnah

1cb8d376

convert run dir mode to constant · 817a030d

Guido Trotter authored 16 years ago

ganeti-noded used to create all directories under /var/run with an
hard-coded mode. convert it to a constant.

Reviewed-by: imsnah

817a030d

Nov 25, 2008

Move the MASTER_SOCKET to SOCKET_DIR · 227647ac

Guido Trotter authored 16 years ago

Before it was in the abstract linux namespace, where unfortunately we
couldn't easily check from python the credentials of the connecting
clients. Now we also have to remove the file on exit and when starting.

Reviewed-by: imsnah

227647ac

ganeti-masterd: create SOCKET_DIR · d823660a

Guido Trotter authored 16 years ago

If SOCKET_DIR doesn't exist we create it in the master daemon, before
trying to put a socket inside it.

Reviewed-by: imsnah

d823660a

Pass ssconf values from master to node · 03d1dba2

Michael Hanselmann authored 16 years ago

Instead of parsing the configuration on the node, we pass the ssconf
values from the master.

Reviewed-by: iustinp

03d1dba2

Nov 21, 2008

Use SSL for master/node RPC · eafd8762
Michael Hanselmann authored 16 years ago
```
This patch enables SSL between masterd and noded.

Reviewed-by: iustinp
```
eafd8762
Get rid of node daemon password · ec17d09c
Michael Hanselmann authored 16 years ago
```
With the new SSL client certificate stuff it's no longer needed.

Reviewed-by: iustinp
```
ec17d09c

ganeti-masterd: Remove PID file at the end · 15486fa7

Michael Hanselmann authored 16 years ago

Removing the PID file should be the last thing done. This patch makes
sure it's also removed when master.server_cleanup() throws an exception.

Also initialize logging only after writing the PID file.

Reviewed-by: iustinp

15486fa7

Reuse HTTP client pool for RPC · 4331f6cd

Michael Hanselmann authored 16 years ago

ganeti-masterd: Add initialization and shutdown of RPC pool. It needs
to be shutdown before forking.

ganeti.cli: Add decorator function to initialize and shutdown RPC pool.

ganeti.rpc: Add functions to initialize and shutdown RPC pool. Throw
exception when used without proper initialization.

gnt-cluster, gnt-node: Use decorator function to initialize and shutdown
RPC pool.

Reviewed-by: iustinp

4331f6cd

Add RPC call to update ssconf files · 6ddc95ec
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
6ddc95ec

Nov 11, 2008

Abstract runtime creation of dirs into a function · 8adbffaa

Iustin Pop authored 16 years ago

Currently the dir creation in ganeti-noded is in the main function. This
is not nice: we move it into a separate function and also add creation
of the OS_LOG_DIR (with different permissions, but in the same way).
This will permit cleanup of the creation of the OS_LOG_DIR from the
backend module (it's done multiple places currently).

Reviewed-by: imsnah

8adbffaa

Oct 24, 2008

Document HttpServer.__init__ · 23e46494

Michael Hanselmann authored 16 years ago

At the same time, simplify the interface a bit by not using a tuple.

Reviewed-by: killerfoxi, ultrotter

23e46494

Oct 23, 2008

Export the disk index in the import/export scripts · 74c47259

Iustin Pop authored 16 years ago

We want to export the disk index as some OSes will only want to export
the first disk (or the second one, etc.), even if we have multiple
disks.

The patch also updates the backend.ExportSnapshot docstring.

Reviewed-by: ultrotter

74c47259

Oct 22, 2008

Convert ImportOSIntoInstance to OS API 10 · 6c0af70e

Guido Trotter authored 16 years ago

- Change ImportOSIntoInstance not to get any "os_disk" and "swap_disk"
  arguments but to accept multiple target images to import, and to
  return a list of booleans with the result of each import
- Change the relevant rpc call and the only caller to conform
- Pass arguments to the import script through the environment
- Run one import os script for each disk image, passing an IMPORT_DEVICE

Reviewed-by: iustinp

6c0af70e

Oct 21, 2008
- Pass request headers in to RAPI handlers. · 7a8f64da
  Oleksiy Mishchenko authored 16 years ago
```
Reviewed-by: iustinp
```
  7a8f64da
Oct 20, 2008

Convert the job queue rpcs to address-based · 99aabbed

Iustin Pop authored 16 years ago

The two main multi-node job queue RPC calls (jobqueue_update,
jobqueue_rename) are converted to address-based calls, in order to speed
up queue changes. For this, we need to change the _nodes attribute on
the jobqueue to be a dict {name: ip}, instead of a set.

Reviewed-by: imsnah

99aabbed

Remove the logger.py module · 82d9caef

Iustin Pop authored 16 years ago

Since now we use only one function from the logger module
(SetupLogging), we move it to utils.py (which is already imported by all
users of this function), and we remove the module.

Reviewed-by: imsnah

82d9caef

Oct 17, 2008
- Cleanup os_add/rename rpc for OS API 10 · d15a9ad3
  Guido Trotter authored 16 years ago
```
- remove now unused osdev and swapdev arguments from backend, noded,
  rpc, cmdlib
- convert docstrings to epydoc

Reviewed-by: iustinp
```
  d15a9ad3
- ETag passing support. · 713faea6
  Oleksiy Mishchenko authored 16 years ago
```
Reviewed-by: imsnah
```
  713faea6
Oct 16, 2008

rapi: Convert to new HTTP server class · 16a8967d
Michael Hanselmann authored 16 years ago
```
Requests are no longer logged to a separate file.

Reviewed-by: amishchenko
```
16a8967d

Improvements to the master startup checks · d7cdb55d

Iustin Pop authored 16 years ago

In order to account for future improvements to master failover, we move
the actual data gathering capabilities from ganeti-masterd into
bootstrap.py, and we leave only the verification into masterd.

The verification procedure is then changed to retry multiple times (up
to one minute) in case most nodes do not respond, and also the algorithm
is changed to require at least half (but not half+1) votes, since our
vote also should count (and we vote for ourselves).

Example for consistent (config-wise) cluster:
  - 5 node cluster, 2 nodes down: still start
  - 4 node cluster, 2 nodes down: retry for one minute, abort

Reviewed-by: ultrotter

d7cdb55d

Add an interface for the drain flag changes/query · 3ccafd0e

Iustin Pop authored 16 years ago

This adds the set/reset in the jqueue and luxi modules, and a way to
query it in OpQueryConfigValues, and also the comand line interface for
it:
$ gnt-cluster queue info
The drain flag is unset
$ gnt-cluster queue drain
$ gnt-cluster queue info
The drain flag is set
$ gnt-cluster queue undrain
$ gnt-cluster queue info
The drain flag is unset

The choice of making the setting via luxi and not an opcode is that
opcodes can't be executed when drained, but we don't query via luxi
since in the future it might become a cluster property as opposed to a
node one.

Reviewed-by: imsnah

3ccafd0e

Oct 15, 2008
- Add a rpc call for changing the drain flag · 5d672980
  Iustin Pop authored 16 years ago
```
A new multi-node call is added that sets/resets the drain flag.

Reviewed-by: imsnah
```
  5d672980
- Implement transport of ganeti errors across luxi · 6797ec29
  Iustin Pop authored 16 years ago
```
This patch adds a generic method to identify the ganeti error given its
class name, and implements this across the luxi protocol.

Reviewed-by: imsnah
```
  6797ec29
- rapi: Whitespace fixes · a2f92677
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: ultrotter
```
  a2f92677
Oct 14, 2008

Export the hypervisor.ValidateParameters over RPC · 6217e295

Iustin Pop authored 16 years ago

The newly-added node-specific ValidateParams hypervisor method is
exported over RPC, using the semi-standard (success, message) return
value. Multi-node call, so that we call on both primary and secondary at
once.

Reviewed-by: ultrotter

6217e295

Oct 13, 2008

Fix a few rpc-related errors · 16ad1a83

Iustin Pop authored 16 years ago

This fixes:
  - whitespace change, double lines between methods
  - duplication of call_upload_file, introduced by mistake in rev 1795
    and which went undetected because of the many changes in that ref
    (only diff -b shows it clearly)
  - call_instance_info didn't pass the hypervisor name parameter, but
    the backend requires it

Reviewed-by: ultrotter

16ad1a83

Oct 12, 2008

Abstract checking own address into a function · caad16e2

Iustin Pop authored 16 years ago

Currently, we check if we have a given ip address (i.e. it's alive on
one of our interfaces) but manually calling TcpPing(source=localhost).
This works, but having it spread all over the code makes it hard to
change the implementation.

The patch abstracts this into a separate utils.OwnIpAddress(addr)
function. We add a rpc call for it, which we use instead of the
(single-use of) call_node_tcp_ping. We leave node_tcp_ping in, as seems
useful and eventually it should be removed in a separate patch.

Reviewed-by: imsnah

caad16e2

Oct 10, 2008

Convert ganeti-noded to new HTTP server class · cc28af80
Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
cc28af80

Convert rpc module to RpcRunner · 72737a7f

Iustin Pop authored 16 years ago

This big patch changes the call model used in internode-rpc from
standalong function calls in the rpc module to via a RpcRunner class,
that holds all the methods. This can be used in the future to enable
smarter processing in the RPC layer itself (some quick examples are not
setting the DiskID from cmdlib code, but only once in each rpc call,
etc.).

There are a few RPC calls that are made outside of the LU code, and
these calls are left as staticmethods, so they can be used without a
class instance (which requires a ConfigWriter instance).

Reviewed-by: imsnah

72737a7f

Oct 08, 2008

Move the hypervisor attribute to the instances · e69d05fd

Iustin Pop authored 16 years ago

This (big) patch moves the hypervisor type from the cluster to the
instance level; the cluster attribute remains as the default hypervisor,
and will be renamed accordingly in a next patch. The cluster also gains
the ‘enable_hypervisors’ attribute, and instances can be created with
any of the enabled ones (no provision yet for changing that attribute).

The many many changes in the rpc/backend layer are due to the fact that
all backend code read the hypervisor from the local copy of the config,
and now we have to send it (either in the instance object, or as a
separate parameter) for each function.

The node list by default will list the node free/total memory for the
default hypervisor, a new flag to it should exist to select another
hypervisor. Instance list has a new field, hypervisor, that shows the
instance hypervisor. Cluster verify runs for all enabled hypervisor
types.

The new FIXMEs are related to IAllocator, since now the node
total/free/used memory counts are wrong (we can't reliably compute the
free memory).

Reviewed-by: imsnah

e69d05fd

Oct 07, 2008

rpc.call_instance_migrate: pass the whole instance · 9f0e6b37

Iustin Pop authored 16 years ago

Currently the call_instance_migrate call only passes the instance name;
we need to pass the whole object for the hypervisor_type changes (all
the other individual instance rpc calls already pass the instance
object).

Reviewed-by: imsnah

9f0e6b37

Implement job 'waiting' status · e92376d7

Iustin Pop authored 16 years ago

Background: when we have multiple jobs in the queue (more than just a
few), many of the jobs (up to the number of threads) will be in state
'running', although many of them could be actually blocked, waiting for
some locks. This is not good, as one cannot easily see what is
happening.

The patch extends the opcode/job possible statuses with another one,
waiting, which shows that the LU is in the acquire locks phase. The
mechanism for doing so is simple, we initialize (in the job queue) the
opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
control to the LU's Exec, it will call a notifier back into the
_JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
the proper queue locking). Because this mechanism does not save the job,
all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
so we also change the load sequence to consider WAITLOCK as RUNNING.

With the patch applied, creating in parallel (via burnin) five instances
on a five node cluster shows that only two are executing, while three
are waiting for locks.

Reviewed-by: imsnah

e92376d7

Oct 06, 2008

Implement job auto-archiving · 07cd723a

Iustin Pop authored 16 years ago

This patch adds a new luxi call that implements auto-archiving of jobs
older than a certain age (or -1 for all completed jobs), and the gnt-job
command that makes use of this (with 'all' for -1).

Reviewed-by: imsnah

07cd723a