Commits · 783a6c0bbac431db60193d7ab6925337bd65ef9b · itminedu / snf-ganeti

Jul 15, 2010

Add test for some aspects of job queue · e58f87a9

Michael Hanselmann authored 14 years ago


This new opcode and gnt-debug sub-command test some aspects of the
job queue, including the status of a job. The bug fixed in commit
2034c70d was identified using this test. A future patch will
run this test automatically from the QA scripts.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e58f87a9

Jul 08, 2010

Add default_iallocator cluster parameter · bf4af505

Apollon Oikonomopoulos authored 14 years ago


Add a cluster parameter to hold the iallocator that will be used by default
when required and no alternative (manually-specified iallocator or
manually-specified node(s)) is given.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bf4af505

Jul 06, 2010

Check and set drbd helper in set params LU · 7b2cd2b4

Luca Bigliardi authored 14 years ago


Signed-off-by: Luca Bigliardi <shammash@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7b2cd2b4

Jul 01, 2010

Adding check_name option to the opcode and luxi call for instance rename · 9c5885e2

René Nussbaumer authored 14 years ago


This will allow instance rename without dns check as it does for instance
add.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9c5885e2

Jun 23, 2010

Remove the obsolete EvacuateNode OpCode/LU · 8de1f1ee

Iustin Pop authored 14 years ago


All code has been switched to the new-style LU… time for cleanup.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

8de1f1ee

Add support for modifying instance OS parameters · 1052d622

Iustin Pop authored 14 years ago


We move the instance OS rename checks earlier, as we need to run the
validation against the new OS, if it has changed.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

1052d622

Add support for modifying cluster OS parameters · 625ac113

Iustin Pop authored 14 years ago


We use _GetUpdatedParams in order to support removal too, and then
validate the OS parameters if the OS exists.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

625ac113

Add support for OS parameters during instance add · 062a7100

Iustin Pop authored 14 years ago


This is not yet complete, as it lacks proper support for instance
import.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

062a7100

Add repetition count to the TestDelay opcode · 85a87e21

Guido Trotter authored 14 years ago


If the repetition count is not passed or is passed as 0 we sleep exactly
one time, otherwise we sleep "repeat" times and log in between.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

85a87e21

May 18, 2010

Implement opcode changes for remote-import · 9bf56d77

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9bf56d77

Implement opcode changes for remote-export · 4a96f1d1

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4a96f1d1

Add opcode to prepare export · 1410fa8d

Michael Hanselmann authored 14 years ago


To prepare a remote export, the X509 key and certificate need to be generated.
A handshake value is also returned for an easier check whether both clusters
share the same cluster domain secret.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1410fa8d

Apr 16, 2010

Add --add-uids/--remove-uids to gnt-cluster modify · fdad8c4d

Balazs Lecz authored 14 years ago


Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fdad8c4d

Add --uid-pool option to gnt-cluster modify · 1338f2b4

Balazs Lecz authored 15 years ago


Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

1338f2b4

Apr 12, 2010

Add a identify-defaults options for import · e588764d

Iustin Pop authored 14 years ago


When importing an instance, all the saved valued will be used as
explicitly specified values, overriding the cluster defaults. This means
export+import will change the status (from default to explicitly
specified) of parameters.

This patch adds a new option that changes the behaviour to identify
parameter values which are equal to the current cluster defaults and
mark them as such. It does this for hv, be and nic parameters.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e588764d

Apr 08, 2010

Add a new cluster parameter maintain_node_health · 3953242f

Iustin Pop authored 15 years ago


This will be used to conditionally enable the watcher node maintenance
feature.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

3953242f

Mar 17, 2010

Instance creation: implement --no-install mode · 25a8792c

Iustin Pop authored 15 years ago


This is a simple patch that adds the no-install mode for instance
creation, allowing import from foreign source of the actual OS (instead
of requiring the preparation of data in a form expected by the import
scripts).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

25a8792c

Allow OS changes without reinstallation · 96b39bcc

Iustin Pop authored 15 years ago


This patch modifies LUSetInstanceParms to allow OS name changes, without
reinstallation, in case an OS gets renamed on-disk.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

96b39bcc

Mar 15, 2010

Implement conversion from plain to drbd · e29e9550

Iustin Pop authored 15 years ago


This patch adds a new mode to instance modify, the changing of the disk
template. For now only plain to drbd conversion is supported, and the
new secondary node must be specified manually (no iallocator support).

The procedure for conversion works as follows:

- a completely new disk template is created, matching the count, size
  and mode of the instance's current disks
- we create manually (not via _CreateDisks) all the missing volumes
- we rename on the primary the LVs to the new name
- we create manually the DRBD devices

Failures during the creation of volumes will leave orphan volumes.
Failure during the rename might leave some disks renamed and some not,
leading to an inconsistent instance.

Once the disks are renamed, we update the instance information and wait
for resync. Any failures of the DRBD sync must be manually handled (like
a normal failure, e.g. by running replace-disks, etc.).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e29e9550

Mar 10, 2010

Fix a python 2.6.5 compatibility · 44db3a6f

Iustin Pop authored 15 years ago


The upcoming python 2.6.5 release has a change that makes delattr(obj,
attr) fail for slots-enabled objects if the attr is not already set.

To prevent against this, we only run the delattr if the attribute is
already set.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

44db3a6f

Mar 09, 2010

Rework the node modify for mc-demotion · 601908d0

Iustin Pop authored 15 years ago


The current code in LUSetNodeParms regarding the demotion from master
candidate role is complicated and duplicates the code in ConfigWriter,
where such decisions should be made. Furthermore, we still cannot demote
nodes (not even with force), if other regular nodes exist.

This patch adds a new opcode attribute ‘auto_promote’, and changes the
decision tree as follows:

- if the node will be set to offline or drained or explicitly demoted
  from master candidate, and this parameter is set, then we lock all
  nodes in ExpandNames()
- later, in CheckPrereq(), if the node is
  indeed a master candidate, and the future state (as computed via
  GetMasterCandidateStats with the current node in the exception list)
  has fewer nodes than it should, and we didn't lock all nodes, we exit
  with an exception
- in Exec, if we locked all nodes, we do a AdjustCandidatePool() run, to
  ensure nodes are locked as needed (we do it before updating the node
  to remove a warning, and prevent the situation that if the LU fails
  between these, we're not left with an inconsistent state)

Note that in Exec we run the AdjustCP irrespective of any node state
change (just based on lock status), so we might simplify the CheckPrereq
even more by not checking the future state, basically requiring
auto_promote/lock_all for master candidates, since the case where we
have more than needed master candidates is rarer; OTOH, this would prevent
manual promotion ahead of time of another node, which is why I didn't
choose this way.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

601908d0

Add support for per-os-hypervisor parameters · 17463d22

René Nussbaumer authored 15 years ago


This patch implements all modifications to support per-os-hypervisor
parameters in the framework.

Signed-off-by: René Nussbaumer <rn@google.com>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

17463d22

Feb 22, 2010

Add a new opcode for node evacuation · d6aaa598

Iustin Pop authored 15 years ago


We add this as a new opcode since we don't want to alter the behaviour
of current opcodes/lus.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d6aaa598

Implement support for mevac in OpTestAllocator · 823a72bc

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

823a72bc

Feb 12, 2010

Implement opcode parameter to remove instance after export · faba00cb

Michael Hanselmann authored 15 years ago


This will be useful for instance moves.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

faba00cb

Feb 11, 2010

Add a generic 'debug_level' attribute to opcodes · ee844e20

Iustin Pop authored 15 years ago


Also automatically fix opcodes which have this missing in the LU init
routine.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ee844e20

Feb 10, 2010

Fix dumpers/loaders after __slots__ cleanup · adf385c7

Iustin Pop authored 15 years ago


Commit 154b9580 changed (correctly) the __slots__ usage, but this broke
dumpers/loaders since we relied directly on the own class __slots__
field.

To compensate, we introduce a simple function for computing the slots
across all parent classes (if any), and use this instead of __slots__
directly.

Note: the _all_slots() function is duplicated between objects.py and
opcodes.py, but the only other options is to introduce a lang.py for
such very basic language items.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

adf385c7

Feb 09, 2010

Add an early release lock/storage for disk replace · 7ea7bcf6

Iustin Pop authored 15 years ago


This patch adds an early_release parameter in the OpReplaceDisks and
OpEvacuateNode opcodes, allowing earlier release of storage and more
importantly of internal Ganeti locks.

The behaviour of the early release is that any locks and storage on all
secondary nodes are released early. This is valid for change secondary
(where we remove the storage on the old secondary, and release the locks
on the old and new secondary) and replace on secondary (where we remove
the old storage and release the lock on the secondary node.

Using this, on a three node setup:

- instance1 on nodes A:B
- instance2 on nodes C:B

It is possible to run in parallel a replace-disks -s (on secondary) for
instances 1 and 2.

Replace on primary will remove the storage, but not the locks, as we use
the primary node later in the LU to check consistency.

It is debatable whether to also remove the locks on the primary node,
and thus making replace-disks keep zero locks during the sync. While
this would allow greatly enhanced parallelism, let's first see how
removal of secondary locks works.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

7ea7bcf6

Jan 27, 2010

Fix __slots__ definitions · 154b9580

Balazs Lecz authored 15 years ago

According to http://docs.python.org/reference/datamodel.html#slots



* The action of a __slots__ declaration is limited to the class where it
  is defined. As a result, subclasses will have a __dict__ unless they
  also define __slots__ (which must only contain names of any
  /additional/ slots).

* If a class defines a slot also defined in a base class, the instance
  variable defined by the base class slot is inaccessible (except by
  retrieving its descriptor directly from the base class). This renders
  the meaning of the program undefined. In the future, a check may be
  added to prevent this.

Signed-off-by: Balazs Lecz <leczb@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Signed-off-by: Iustin Pop <iustin@google.com>

154b9580

Dec 16, 2009

Op/LUCreateInstance support for (no) name checks · 5f23e043

Iustin Pop authored 15 years ago


This adds a new opcode parameter ‘name_check’ (similar to ip_check) that
is not required to be present (to easy backwards compatibility for
tools).

It also adds a CheckArguments to LUCreateInstance and changes the
workflow related to instance IP checks and NIC initialisation based on
it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

5f23e043

Nov 03, 2009

Another round of pylint-related style fixes · 099c52ad

Iustin Pop authored 15 years ago


A newer version of pylint, more warnings…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

099c52ad

Nov 02, 2009

Some improvements to gnt-node repair-storage · 7e9c6a78

Iustin Pop authored 15 years ago


Currently the repair storage has two issues:

- down instances are aborting the operation, even though they should be
  ignored (it's not technically possible to know their disk status
  unless we would activate their disks)
- if the VG is so broken that disks cannot be activated via gnt-instance
  activate-disks or gnt-instance startup, it's not possible to repair
  the VG at all

The patch makes the opcode skip down instances and also introduces an
``--ignore-consistency`` flag for forcing the execution of the LU.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7e9c6a78

Oct 13, 2009

opcodes: Add missing shutdown_timeout to OpRemoveInstance · fc1baca9
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
```
fc1baca9

Add timeout options to other LUs · 17c3f802

Guido Trotter authored 15 years ago


All the LUs that shut down the instance need to be able too pass the
timeout parameter as well.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

17c3f802

Oct 09, 2009

Accept shutdown timeout from the user · 6263189c

Guido Trotter authored 15 years ago


Using the new --timeout option:

- gnt-instance shutdown is changed to accept a timeout
- the opcode is changed to hold one
- the LU is changed to optionally get one
- the rpc is changed to carry one
- the backend is changed to take it as a parameter rather than
  hardcoding it in the function

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

6263189c

Oct 05, 2009

Add force_variant slot to Create/ReinstallInstance · 47804ec9

Guido Trotter authored 15 years ago


These two opcode need to know whether an unknown variant must be forced
through or not.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

47804ec9

Sep 17, 2009

Add an error-simulation mode to cluster verify · a0c9776a

Iustin Pop authored 15 years ago


One of the issues we have in ganeti is that it's very hard to test the
error-handling paths; QA and burnin only test the OK code-path, since
it's hard to simulate errors.

LUVerifyCluster is special amongst the LUs in the fact that a) it has a
lot of error paths and b) the error paths only log the error, they don't
do any rollback or other similar actions. Thus, it's enough for this LU
to separate the testing of the error condition from the logging of the
error condition.

This patch does this by replacing code blocks of the form:

  if x:
    log_error()
    [y]

into:

  log_error_if(x)
  [if x:
    y
  ]

After this change, it's simple enough to turn on logging of all errors
by adding a special case inside log_error_if such that if the incoming
opcode has a special ‘debug_simulate_errors’ attribute and it's true, it
will log unconditionally the error.

Surprisingly this also turns into an absolute code reduction, since some
of the if blocks were simplified. The only downside to this patch is
that the various _VerifyX() functions are now stateful (modifying an
attribute on the LU instance) instead of returning a boolean result.

Last note: yes, this discovered some error cases in the logging.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a0c9776a

Introduce parseable error codes in LUVerifyCluster · 7c874ee1

Iustin Pop authored 15 years ago


Currently the output of cluster verify can be parsed for 'ERROR'
messages, but that is the only indication we get (error or no error). In
order to allow monitoring tools to separate different error conditions,
this patch introduces a new output format (“gnt-cluster verify
--error-codes”) that changes the output from human-friendly to
machine-friendly. In this mode, an error line changes from:
  ERROR: node node1: drbd minor 1 of instance inst1.is not active

to:
  ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active

i.e. the error message is a ‘:’-separated field, with ERROR in the first
place, the error code in the second, the object type (cluster, node,
instance) in the third, the name of the object (for nodes/instances) in
the fourth, and then the text message.

The patch also removes some of the verbosity of the operation
(“Verifying instance X”, “Verifying node X”) since on big clusters these
informational messages can quickly fill up an entire screen. The
original behaviour can be restored via the ‘--verbose’ option.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7c874ee1

Aug 24, 2009

Add OPMoveInstance and LUMoveInstance · 313bcead

Iustin Pop authored 15 years ago


This patch adds a basic version of LUMoveInstance. It doesn't yet
support iallocator-mode and it's implemented in old-style (non-TL) mode.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

313bcead

Aug 17, 2009

Add opcode to repair storage volumes · 76aef8fc

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

76aef8fc