Commits · 8979196a7448011345c62ec7ca6572ef93caf71b · itminedu / snf-ganeti

Aug 03, 2009

Add new opcode to list physical volumes · 9e5442ce

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9e5442ce

Jul 31, 2009

cmdlib: Add new opcode to migrate node · 80cb875c

Michael Hanselmann authored 15 years ago


It migrates all primary instances from the node to their secondaries.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

80cb875c

Jul 22, 2009

Add new opcode to evacuate nodes · 7ffc5a86

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

7ffc5a86

Jul 17, 2009

Optimizie OpCode loading · 363acb1e

Iustin Pop authored 15 years ago


This patch converts the opcode loading to a pre-built map (at import
time) instead of iteration over the globals dict at each call.

Microbenchmarks show that this should be around three times faster, and
burnin still passes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

363acb1e

Jun 19, 2009

LU execution: implement dry-run framework · 20777413

Iustin Pop authored 15 years ago


This patch adds a new (global) opcode flag 'dry_run' which, when True,
causes early exit from the LU workflow, returning a special value from
the LU object (initialized in the parent LogicalUnit class, and which if
not overriden from child LUs will be None).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

20777413

Introduce __slots__ deriving in opcodes.py · 4f05fd3b

Iustin Pop authored 15 years ago


This simple patch adds to all opcodes extension of the base opcode
__slots__. This way we can add slots across all opcodes, for example
'dry-run'.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

4f05fd3b

Jun 08, 2009

Allow modifying of default nic parameters · 5af3da74

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5af3da74

May 27, 2009

Add a node powercycle command · f5118ade

Iustin Pop authored 15 years ago


This (somewhat big) patch adds support for remotely rebooting the nodes
via whatever support the hypervisor has for such a concept.

For KVM/fake (and containers in the future) this just uses sysrq plus a
‘reboot’ call if the sysrq method failed. For Xen, it first tries the
above, and then Xen-hypervisor reboot (we first try sysrq since that
just requires opening a file handle, whereas xen reboot means launching
an external utility).

The user interface is:

    # gnt-node powercycle node5
    Are you sure you want to hard powercycle node node5?
    y/[n]/?: y
    Reboot scheduled in 5 seconds

The node reboots hopefully after sending the reply. In case the clock is
broken, “time.sleep(5)” might take ages (but then I suspect SSL
negotiation wouldn't work).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

f5118ade

May 19, 2009

Add -H/-B startup parameters to gnt-instance · d04aaa2f

Iustin Pop authored 15 years ago


This patch modifies the start instance script, opcode and logical unit
to support temporary startup parameters.

Different from 1.2, where only the kernel arguments were supporting
changes (and thus xen-pvm specific), this version supports changing all
hypervisor and backend parameters (with appropriate checks).

This is much more flexible, and allows for example:
  - start with different, temporary kernel
  - start with different memory size

Note: in later versions, this should be extended to cover disk
parameters as well (e.g. start with drbd without flushes, start with
drbd in async mode, etc.).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d04aaa2f

Feb 24, 2009

Remove the extra_args parameter in instance start · 07813a9e

Iustin Pop authored 16 years ago

This patch removes the extra_args parameter and instead switches the
instance to the HV_KERNEL_ARGS hypervisor option.

This is a big change, but it's a needed cleanup, this extra parameter on
all RPC calls is not generic and we also need to have a persistent value
here.

Reviewed-by: imsnah

07813a9e

Feb 10, 2009

Implement modification of the drained flag · c9d443ea

Iustin Pop authored 16 years ago

This patch adds LU and cli-level support for modification of the node
drained flag. It is similar to the offline changes.

Reviewed-by: imsnah

c9d443ea

Feb 06, 2009

Fix rapi job listing · ee69c97f

Iustin Pop authored 16 years ago

This patch fixes a couple of issues with the job listing:
  - in case of a non-existing job, nicely raise 404 instead of 500
  - in the job detail listing, also list the job log, the job
    timestamps, etc.
  - the opcode migrate instance was missing its description field

Reviewed-by: imsnah

ee69c97f

Feb 04, 2009

Implement lockless query operations · ec79568d

Iustin Pop authored 16 years ago

This patch adds the framework for, and enables lockless OpQueryInstances. This
means that instances will be shown in ERROR_up or ERROR_down state, even though
this is not an error (but just an in-progress job).

The framework is implemented as follows:
  - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
    an additional “use_locking” flag which will denote whether to lock
    or not; this patch only implements this for LUQueryInstances
  - the luxi query functions take an additional argument use_locking
    which is passed to the master daemon, and then passed to the above
    opcodes
  - cli.py export a new SYNC_OPT command line options which implement
    setting this flag to true
  - except for gnt-instance list, which uses this option, and for
    name-only queries (e.g. QueryNodes(fields=["names"])), all other
    callers are setting this flag to True
  - RAPI also sets the flag to True

The patch was tested with a continuous (0.2s sleep in-between)
gnt-instance list during a burnin, and no problems were observed.

Reviewed-by: ultrotter

ec79568d

Jan 20, 2009
- Fix a couple of epydoc warnings · 2f907a8c
  Iustin Pop authored 16 years ago
```
Reviewed-by: ultrotter
```
  2f907a8c
Jan 13, 2009

Forward port the live migration from 1.2 branch · 53c776b5

Iustin Pop authored 16 years ago

This is forward port via copy (and not individual patches cherry-pick)
of the latest code on the 1.2 branch related to the migration.

The changes compared to 1.2 are the fact that we don't need the
IdentifyDisks step anymore (the drbd rpc calls are independent now), and
the rpc module improvements.

Reviewed-by: ultrotter

53c776b5

Jan 12, 2009

Introduce a very simple LU to force config updates · afee0879

Iustin Pop authored 16 years ago

This LU can be used to force a push of the config in case it's needed,
for example after an upgrade to update the ssconf_release_version file.

Reviewed-by: imsnah

afee0879

Dec 08, 2008

gnt-node modify: add the offline attribute · 3a5ba66a

Iustin Pop authored 16 years ago

This patch changes gnt-node modify and the associated opcode/lu to allow
modification of the node offline attribute.

Setting a node into offline mode automatically demotes it from the
master role.

Reviewed-by: ultrotter

3a5ba66a

Dec 02, 2008

Add cluster candidate pool size parameter · 4b7735f9

Iustin Pop authored 16 years ago

This patch adds a new cluster paramater "candidate_pool_size" which
tracks the desired size of the list of nodes with the master_candidate
flag set.

Reviewed-by: imsnah

4b7735f9

Add a gnt-node modify operation · b31c8676

Iustin Pop authored 16 years ago

This patch adds the OpCode, LogicalUnit and gnt-node command for
modifying node parameters, more specifically the master candidate flag
for a node.

Reviewed-by: imsnah

b31c8676

Nov 25, 2008

Implement support for multi devices changes · 24991749

Iustin Pop authored 16 years ago

This big patch adds support for:
  - changing NIC/disks in the multi-device model
  - adding/removing NICs
  - adding/removing disks

The patch is big and not very nice; the error checking paths are not
very clear.

The biggest problem is that from a simple instance.ATTR=VAL change
(which didn't throw errors before) now we are creating and removing
disks in this LU.

Reviewed-by: imsnah

24991749

Nov 24, 2008

IAllocator: use the right hypervisor · 8cc7e742

Guido Trotter authored 16 years ago

Since the hypervisor is instance dependent we'll get one on instance creation,
and use the one in the instance config on relocation.

Reviewed-by: iustinp

8cc7e742

Nov 20, 2008

Initial multi-disk/multi-nic support · 08db7c5c

Iustin Pop authored 16 years ago

This patch adds support for mult-disk/multi-nic in:
  - instance add
  - burnin

The start/stop/failover/cluster verify work as expected. Replace disk
and grow disk are TODO.

There's also a change gnt-job to allow dictionaries to be listed in
gnt-job info.

Reviewed-by: imsnah

08db7c5c

Oct 16, 2008

Enable gnt-cluster modify to hv/beparams · 779c15bb

Iustin Pop authored 16 years ago

This patch enables the cluster modify to change:
  - enabled hypervisor list
  - hvparams (per hypervisor)
  - beparams (only the default group)

Syntax:
  gnt-cluster modify -B vcpus=3 -H xen-pvm:no_initrd_path

Validation for parameters is somewhat missing - the individual
hypervisors will be checked for syntax and validation, but beparams
doesn't have validation yes (nowhere), it should be added here once we
have a global method (will come soon).

Reviewed-by: imsnah

779c15bb

Oct 14, 2008

grow-disk: wait until resync is completed · 6605411d

Iustin Pop authored 16 years ago

The patch adds a new ‘--no-wait-for-sync’ parameter to grow-disk similar
to the one in instance add, and changes the default to wait.

This is cleaner as at the moment when the command returns, we either
have a fully synced disk or there is an error.

This is a forward-port of rev 1183 on the 1.2 branch.

Reviewed-by: ultrotter

6605411d

Change over to beparams · 338e51e8

Iustin Pop authored 16 years ago

This big patch changes the master code to use the beparams. Errors might
have crept in, but it passes a small burnin.

Reviewed-by: ultrotter

338e51e8

Allow instance info to only query the config file · 57821cac

Iustin Pop authored 16 years ago

This patch adds a new '-s' parameter to ‘gnt-instance info’ that makes
it return only 'static' information. This is much faster, especially for
drbd instances.

This is a forward-port of rev 1570 on the ganeti-1.2 branch, resending
due to some conflicts.

Reviewed-by: imsnah

57821cac

Change gnt-instance modify to the hvparams model · 74409b12
Iustin Pop authored 16 years ago
```
Reviewed-by: imsnah
```
74409b12

Switch instance hypervisor parameters to hvparams · 6785674e

Iustin Pop authored 16 years ago

This big patch changes instance create to the new hvparams structure.
Old parameters are removed, so old jobs or old instances file will break
current clusters.

Reviewed-by: ultrotter

6785674e

Oct 08, 2008

Move the hypervisor attribute to the instances · e69d05fd

Iustin Pop authored 16 years ago

This (big) patch moves the hypervisor type from the cluster to the
instance level; the cluster attribute remains as the default hypervisor,
and will be renamed accordingly in a next patch. The cluster also gains
the ‘enable_hypervisors’ attribute, and instances can be created with
any of the enabled ones (no provision yet for changing that attribute).

The many many changes in the rpc/backend layer are due to the fact that
all backend code read the hypervisor from the local copy of the config,
and now we have to send it (either in the instance object, or as a
separate parameter) for each function.

The node list by default will list the node free/total memory for the
default hypervisor, a new flag to it should exist to select another
hypervisor. Instance list has a new field, hypervisor, that shows the
instance hypervisor. Cluster verify runs for all enabled hypervisor
types.

The new FIXMEs are related to IAllocator, since now the node
total/free/used memory counts are wrong (we can't reliably compute the
free memory).

Reviewed-by: imsnah

e69d05fd

Oct 01, 2008

Add new query to get cluster config values · ae5849b5

Michael Hanselmann authored 16 years ago

This can be used to retrieve certain cluster config values from
within clients.

OpDumpClusterConfig was not used anywhere, hence I'm just reusing
it. The way ConfigWriter.DumpConfig returned the configuration
was not thread-safe, anyway (no deepcopy).

Reviewed-by: iustinp

ae5849b5

Remove last use of utils.RunCmd from the watcher · 5188ab37

Iustin Pop authored 16 years ago

The watcher has one last use of ganeti commands as opposed to sending
requests via luxi. The patch changes this to use the cli functions.

The patch also has two other changes:
  - fix the docstring for OpVerifyDisks (found out while converting
    this)
  - enable stderr logging on the watcher when “-d” is passes

Reviewed-by: imsnah

5188ab37

Sep 29, 2008

Implement job summary in gnt-job list · 60dd1473

Iustin Pop authored 16 years ago

It is not currently possibly to show a summary of the job in the output
of “gnt-job list”. The closes is listing the whole opcode(s), but that
is too verbose. Also, the default output (id, status) is not very
useful, unless one looks for (and knows about) an exact job ID.

The patch adds a “summary” description of a job composed of the list of
OP_ID of the individual opcodes. Moreover, if an opcode has a ‘logical’
target in a certain opcode field (e.g. start instance has the instance
name as the target), then it is included in the formatting also. It's
easier to explain via a sample output:

gnt-job list
ID Status  Summary
1  error   NODE_QUERY
2  success NODE_ADD(gnta2)
3  success CLUSTER_QUERY
4  success NODE_REMOVE(gnta2.example.com)
5  error   NODE_QUERY
6  success NODE_ADD(gnta2)
7  success NODE_QUERY
8  success OS_DIAGNOSE
9  success INSTANCE_CREATE(instance1.example.com)
10 success INSTANCE_REMOVE(instance1.example.com)
11 error   INSTANCE_CREATE(instance1.example.com)
12 success INSTANCE_CREATE(instance1.example.com)
13 success INSTANCE_SHUTDOWN(instance1.example.com)
14 success INSTANCE_ACTIVATE_DISKS(instance1.example.com)
15 error   INSTANCE_CREATE(instance2.example.com)
16 error   INSTANCE_CREATE(instance2.example.com)
17 success INSTANCE_CREATE(instance2.example.com)
18 success INSTANCE_ACTIVATE_DISKS(instance1.example.com)
19 success INSTANCE_ACTIVATE_DISKS(instance2.example.com)
20 success INSTANCE_SHUTDOWN(instance1.example.com)
21 success INSTANCE_SHUTDOWN(instance2.example.com)

This is done by a simple change to the opcode classes, which allows an
opcode to format itself. The additional function is small enough that it
can go in opcodes.py, where it could also be used by a client if needed.

Reviewed-by: imsnah

60dd1473

Sep 01, 2008

Pass the force param to SetInstanceParms · 4300c4b6

Guido Trotter authored 16 years ago

It was already allowed in gnt-instance modify, but ignored.
It will be used to force skipping parameter checks.

This is a forward-port from branches/ganeti-1.2

Original-Reviewed-by: imsnah
Reviewed-by: iustinp

4300c4b6

Aug 29, 2008
- Merge r1536 from branches/ganeti/ganeti-1.2 · 5397e0b7
  Alexander Schreiber authored 16 years ago
```
Add HVM device type flags 2/3

Reviewed-by: ultrotter
```
  5397e0b7
Aug 08, 2008
- Two small style fixes · 0a7bed64
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
  0a7bed64
Jul 30, 2008

Rework master startup/shutdown/failover · b1b6ea87

Iustin Pop authored 16 years ago

This (big) patch reworks the master startup/shutdown and the fixes the
master failover.

What does the patch do?

For master start/stop:
  - remove the old ganeti-master script and its associated man page
  - moves the ip start/stop directly into the backend.(Start|Stop)Master
  - adds start/stop of the master/rapi daemon into these functions,
    selectively based on the start/stop arguments
  - makes the master call via rpc StartMaster(start_daemons=False) to
    the local node so that the master IP is started
  - and finally changes the example init.d script to directly start and
    stop all three daemons, since they do the right thing (depending on
    master/not master role)

For master failover:
  - moves the code from LUMasterFailover into bootstrap.MasterFailover,
    since we need to start/stop the master during this operation and
    thus it can't be executed from the master
  - removes the LUMasterFailover and its associated opcode

Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
master' are not seen during startup on non-master nodes.

Reviewed-by: ultrotter

b1b6ea87

Jul 15, 2008

Documentation updates · a7399f66
Iustin Pop authored 16 years ago
```
Reviewed-by: imsnah
```
a7399f66

Rename BaseJO to BaseOpCode · 0e46916d

Iustin Pop authored 16 years ago

Since we don't have for now a job definition object anymore, we rename
this class to BaseOpCode. It's still useful (and not merged with OpCode)
since it holds all the 'pure' logic (no custom field handling, etc.)
whereas OpCode holds opcode specific data (OP_ID handling, etc).

The patch also fixes the module's docstring.

Reviewed-by: imsnah

0e46916d

Jul 09, 2008
- Remove old job queue code · 2467e0d3
  Michael Hanselmann authored 16 years ago
```
Reviewed-by: iustinp
```
  2467e0d3
Jun 23, 2008

Fix gnt-cluster “command” and “copyfile” · b3989551

Iustin Pop authored 16 years ago

Since the disabling of forking in the master daemon, the two ssh-based
subcommands were not working anymore. However, there is no need at all
for the commands to be run from the master daemon (permissions to read
the cluster private ssh key notwithstanding), they can be run directly
from the command line utilities.

The patch removes the two opcodes OpRunClusterCommand and
OpClusterCopyFile (and their associated LUs) and changes the code in
‘gnt-cluster’ to query the list of nodes and run directly the SshRunner
over the list. As such, all forking is done from the gnt-cluster script,
and the commands are working again.

Reviewed-by: imsnah

b3989551