Commits · 4bba8e4cb44ca5b357cbcf895c06a9ea4b4677f5 · itminedu / snf-ganeti

May 09, 2011

Add 2 new variables to the OS scripts environment · 519719fd


Add INSTANCE_PRIMARY_NODE and INSTANCE_SECONDARY_NODES. These new
values are useful for OS scripts that needs to know the nodes where
the instance lives.. or has lived.

Signed-off-by: Iustin Pop <iustin@google.com>
[iustin@google.com: fixed small issue with SECONDARY_NODES]
Reviewed-by: Iustin Pop <iustin@google.com>

519719fd

May 02, 2011

Cluster verify: check for missing bridges · 20d317d4

Iustin Pop authored 14 years ago


Currently cluster verify doesn't check for bridge information; the
only checks are done at instance create and failover/migrate
time. This means a cluster that seems healthy will fail creation jobs.

This patch implements a simple verification that all nodes (in the
entire cluster, so doesn't work well for multi-group) have all the
required bridges: the default one plus any instance bridge.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

20d317d4

Apr 20, 2011

Shared storage instance migration · b9187ba2

Apollon Oikonomopoulos authored 14 years ago


Modify LUMigrateInstance and TLMigrateInstance to allow instance migrations for
instances with DTS_EXT_MIRROR disk templates.

Migrations of shared storage instances require either a target node, or an
iallocator to determine the target node. If none is given, the cluster default
iallocator is used.

Locking behaviour: If the iallocator is used, then initially all nodes are
locked and subsequently only the locks on the source node and the target node
selected by the iallocator are retained.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: small changes in cmdlib.py]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b9187ba2

Add bdev_sizes RPC call · 69266fae

Apollon Oikonomopoulos authored 14 years ago


The bdev_sizes multi-node RPC call returns the sizes of the requested
block devices on the desired nodes. Its intended use is to verify the
existence of a block device on a given node for shared block storage
support.

Block device paths are expected to lie under constants.BLOCKDEV_DIR
("/dev/disk" by default), where persistent symlinks for block devices
are assumed to exist.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: small changes in backend.py]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

69266fae

Core shared file storage support · 53197381

Apollon Oikonomopoulos authored 14 years ago


This patch introduces core file storage support, consisting of the following:

A configure-time switch for enabling/disabling shared file storage
support and controlling the shared file storage location:
--with-shared-file-storage-dir=.  Shared file storage configuration is then
available as _autoconf.ENABLE_SHARED_FILE_STORAGE and
_autoconf.SHARED_FILE_STORAGE_DIR and there is a cluster-wide ssconf
key named "shared_file_storage_dir" for changing the file location.

A new disk template named "sharedfile" (DT_SHARED_FILE), using
ganeti.bdev.FileStorage.

Auxiliary functions in lib/config.py to handle shared file storage.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
[iustin@google.com: small style fixes]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

53197381

Feb 03, 2011

backend: Disable compression in export info file · 775b8743

Michael Hanselmann authored 14 years ago


The new import/export infrastructure in Ganeti 2.2 and up handles
compression differently. It no longer writes compressed files to the
destination. Unfortunately changing this behaviour would be non-trivial,
so in the meantime setting “compression = none” will hopefully avoid
some confusion.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

775b8743

Jan 28, 2011

Re-create instance disk symlinks on activate · c417e115

Iustin Pop authored 14 years ago


This patch implements recreation of instance disk symlinks when the
activate-disks operation is run. Until now, it was not possible to
re-create these symlinks without stopping and starting or migrating an
instance as the RPC call where this is done was in instance startup
and migration.

In order to do this, the blockdev_assemble rpc call needs the disk
index too, which is added to the protocol. This is a change from 2.3
and makes instance startup incompatible (FYI).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c417e115

Jan 27, 2011

cluster verify: add hvparams verification · 58a59652

Iustin Pop authored 14 years ago


Currently, the validity of the hypervisor parameters is only checked
at init/modification time, and not in the cluster verify. This is bad,
as it can lead to inconsistent state that is only detected when the
next modification (which can be unrelated) is made, leading to
unexpected error messages.

This patch adds both syntax verification (in masterd) and validity
verification on remote nodes. The downside of the patch is that on
clusters with many instances which have custom parameters, it will be
slow. A possible improvement would be to detect duplicate, identical
set of parameters, and collapse these into a single verification, but
that is left as a TODO (in case it becomes problematic).

An additional change is in utils.ForceDict, where we said 'key',
whereas this function is always used with parameter dicts, so I
changed it to "Unknown parameter".

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

58a59652

Jan 26, 2011

Verify disks: increase parallelism and other fixes · 397693d3

Iustin Pop authored 14 years ago


The recent work on multi-VG support has converted LUClusterVerifyDisks
into doing serialised calls to each node, as each node can have
different VGs. This is suboptimal, especially for big clusters, where
this LU is executed by the watcher very often.

This patch changes the logic based on the observation that querying a
node for its VGs and then requesting a LV list for those VGs is
equivalent to simply asking for all LVs, without specifying the VG
name(s). So backend.py needs changes to accept an empty VG list, and
the LU itself partially reverts to the previous version.

Additionally, we do two other fixes to this LU:

- small improvement in getting the instance list from the config
- MapLVsByNode works for all disk types, hence no need to restrict to
  the DRBD template, especially as today we can "recreate" disks for
  plain volumes too (the warning message in gnt-cluster is updated
  too)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

397693d3

Jan 20, 2011

Improve import/export timeout settings · 4478301b

Michael Hanselmann authored 14 years ago


With this patch, the exporting node will retry to connect a few times.
The receiving node will make use of the master's increased timeout (see
previous patch).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4478301b

Jan 11, 2011

utils: Move I/O-related code into separate file · 3865ca48

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3865ca48

Fix a typo in backend.py · 0f39886a

René Nussbaumer authored 14 years ago


Sorry I thought I did run commit-check but must not have paid attention
to its output. There was a typo in the docstring. This patch fixes this.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0f39886a

Add backend method for pause/resume sync of devices · 5119c79e

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

5119c79e

Jan 05, 2011

Adding additional VerifyNode checks to backend · 16f41f24

René Nussbaumer authored 14 years ago


This adds checks for out of band support. The helpers have to exist and
they have to be executable.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

16f41f24

Dec 21, 2010

Allow customisation of the disk index separator · 3536c792

Iustin Pop authored 14 years ago


As per issue 124, some Xen versions (or packaging) don't deal nicely
with the colon being part of a disk name. Therefore we add a
configure-time option for customising this.

Note: setting the separator to interesting values like / is not
handled by the code. This being a configure-time option (e.g. to be
set by distribution packagers), we assume the person building the code
knows what they are doing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3536c792

Dec 02, 2010

Make snapshots multi-vg aware · 800ac399

Iustin Pop authored 14 years ago

Currently, the Snapshot() function of LogicalVolume returns only the
logical volume path, with the assumption that we only have one VG. But
with the recent changes, it makes more sense to return the full data (vg
and lv) from it, so as to not require computing it in the master.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

800ac399

Make rpc.call_lv_list() VG-aware · 84d7e26b

Dmitry Chernyak authored 14 years ago


Changes to backend.GetVolumeList():
- now accepts a list of VGs instead of one VG
- returns LV names in the form "vg_name/lv_name"

Corresponding changes are done in: VerifyDisks, VerifyNode,
LUCreateInstance (for both disk creation and adoption cases)

Now the syntax
"gnt-instance add ... --disk N:adopt=LV_NAME,vg=VG_NAME"
as was described earlier in the man page works.

Signed-off-by: Dmitry Chernyak <dmi.chernyak@gmail.com>
[iustin@google.com: QA changes for reserved LVs, style fixes and a few
 extra error checks, reviewed by hansmi/rn]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

84d7e26b

Dec 01, 2010
- Adding backend functionality to call oob helper · b2f29800
  René Nussbaumer authored 14 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  b2f29800
Nov 29, 2010

backend: Add support for IPv6 in import/export · 855d2fc7

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

855d2fc7

Nov 28, 2010

Move compilation of some regexes to init time · 0b5303da

Iustin Pop authored 14 years ago


I have found a few regexes which are static and thus can be moved to
load time, rather than run time, creation.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0b5303da

Nov 26, 2010

RPC call_node_info: change protocol · cb6a0296

Iustin Pop authored 14 years ago


Currently, the call_node_info RPC does always check both the VG free
space and the hypervisor information. However, in ⅔ of the uses, we only
care about one or the other. Therefore, we change it so that if any of
the passed parameters is None, we don't perform the respective check. We
also modify its callers to only pass in what they need.

This also helps if the "default" hypervisor is broken and we want to
create an instance for another hypervisor.

With this patch, the duration of this rpc changes from 500ms to 90ms for
a normal LVM+Xen PVM node, when we only require the LVM data; when we
only require the hypervisor data, it doesn't change (as the “xm list”
time is dominant).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

cb6a0296

Nov 03, 2010

Fix disk checks in “gnt-cluster verify” · c6a9dffa

Michael Hanselmann authored 14 years ago


Tests have shown that the changes in commit b8d26c6e don't work as
wanted. If any disk wasn't found on the node, all disks located on the
same node would show as faulty. The cause was incorrect exception
handling on the node.

This patch changes the RPC call to return a per-disk success/error
status, avoiding the problem.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Luca Bigliardi <shammash@google.com>

c6a9dffa

Oct 28, 2010

Add support for vm_capable in cluster verify · 8964ee14

Iustin Pop authored 14 years ago


The method to make vm_capable integrate easily into cluster verify is as follows:

- we add a new NV_VMNODES that represents *non*-vm-capable nodes
- the LU populates this list (it's expected that non-vm_capable nodes
  are few compared to vm_capable nodes)
- backend skips the checks that are related to VM hosting
- in the LU, we reorder the VM-related checks so that they occur after
  the non-VM (generic) tests, and we only execute them conditionally

Additionally, we add some support to the instance checks to detect
instances living on bad nodes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8964ee14

Oct 26, 2010

Second iteration over backend.BlockdevWipe · da63bb4e

René Nussbaumer authored 14 years ago


This patch now uses dd entirely to wipe the disk, make it
much easier to wipe in blocks so we can give interactive feedback
about the status.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

da63bb4e

Oct 25, 2010

Simplify and extend the instance OS env · f2165b8a

Iustin Pop authored 14 years ago


Some parameters were missing (uuid, c/mtime). We simplify the export
method; unfortunately we cannot simply iterate over __slots__ since the
mapping is not 1:1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f2165b8a

Oct 22, 2010

backend.Upload: switch to utils.SafeWriteFile · 8f065ae2

Iustin Pop authored 14 years ago


This allows serialization of updates to a given file, with respect to
other cooperating writers.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8f065ae2

Adding backend method to wipe a block device · 69dd363f

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

69dd363f

Sep 30, 2010

Abstract OS name/variant functions · 870dc44c

Iustin Pop authored 14 years ago


Currently, the computation of the 'pure' name or the variant is
hardcoded and spread around the functions that need it. This is not
nice, and in the future we'd spread it even more with more usage of
variants/pure os names.

This patch abstracts these functions into the OS class, and then
replaces the hardcoded uses with the new functions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

870dc44c

Sep 23, 2010

Migrate call from backend._GetVGInfo to bdev.LogicalVolume.GetVGInfo · 673cd9c4

René Nussbaumer authored 14 years ago


This patch removes duplicate code found in backend which also needs to
get VG infos. To make it simpler we moved to bdev.LogicalVolume.GetVGInfo.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

673cd9c4

Sep 13, 2010

Fix OS_VARIANT variable setting · a025e535

Vitaly Kuznetsov authored 14 years ago


This was introduced in efaa9b06.

 in OSCoreEnv:
  inst_os.name is pure operating system name (without variant) as variant is stripped
   in OSFromDisk(). So we always get variant = inst_os.supported_variants[0] (first
   variant in variants list).
  Adding argument os_name with full name (including variant) solves this problem.

Signed-off-by: Vitaly Kuznetsov <vitty@altlinux.ru>
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
[modified by iustin to handle the call to OSCoreEnv from ValidateOS too]
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a025e535

Sep 07, 2010

Move job queue to new ganeti.runtime · 82b22e19

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

82b22e19

Sep 03, 2010
- Log warning instead of raising OpExecError for ndisc6 · 2dc1237c
  Manuel Franceschini authored 14 years ago
```
Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
  2dc1237c
Aug 23, 2010

Add RPC calls to update /etc/hosts · 19ddc57a

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

19ddc57a

Aug 20, 2010

Use family in backend.StartMaster · d8e0caa6

Manuel Franceschini authored 14 years ago


This patches changes the StartMaster method to consult the cluster
primary ip version when deciding whether to use arping or ndisc6 after
activating the master ip.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d8e0caa6

Aug 19, 2010

Removing all ssh setup code from the core · e8d61457

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e8d61457

Support IPv6 cluster init · e7323b5e

Manuel Franceschini authored 14 years ago


Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e7323b5e

Aug 18, 2010

Support for resolving hostnames to IPv6 addresses · b705c7a6

Manuel Franceschini authored 14 years ago


This patch enables IPv6 name resolution by using socket.getaddrinfo
instead of socket.gethostbyname_ex.

It renames the HostInfo class to Hostname and unifies its use throughout
the code. This is achieved by using static calls where no object is
needed and removes some obsolete code.

For now, we just resolve to IPv4 addresses, but this will change once it
is needed.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b705c7a6

Introduce new IPAddress classes · 8b312c1d

Manuel Franceschini authored 14 years ago


This patch unifies the netutils functions dealing with IP addresses to
three classes:
- IPAddress: Common IP address functionality
- IPv4Address: IPv4 specific functionality
- IPv6address: IPv6-specific functionality

Furthermore it adds methods to check whether an address is a loopback
address, replacing the .startswith("127") for IPv4 and adding IPv6
support.

It also provides the basis for future IPv6 address handling. Methods to
convert IP strings to their corresponding interger values will allow to
canonicalize IPv6 addresses.

Signed-off-by: Manuel Franceschini <livewire@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8b312c1d

Jul 29, 2010

Instance migration: remove error on missing link · b8ebd37b

Iustin Pop authored 14 years ago


Since we don't support upgrades from 1.2.4 without restarting the
instance, the 'not restarted since 1.2.5' check/error is
wrong/misleading.

Since the live migration works anyway without the links (it recreates
them during the disk reconfiguration anyway), we remove the check and we
transform it into a warning (to the node daemon log only,
unfortunately).

For 2.3, we'll need to change the symlink creation from instance start
time to disk activation time (but that requires more RPC changes).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b8ebd37b

Jul 26, 2010

Change the meaning of call_node_start_master · 91492e57

Iustin Pop authored 14 years ago


Currently, backend.StartMaster (the function behind this RPC call) will
activate the master IP and then, if the start_daemons parameter is true,
it will also activate the master role.

While this works, it has two issues:

- first, it will activate the master IP unconditionally, even if this
  node will not start the master daemon due to missing votes
- second, the activation of the IP is done twice if start_daemons is
  true, because the master daemon does its own activation too

This behaviour seems to be unmodified since Summer 2008, so probably any
rationale on why this is done in two places is forgotten.

The patch changes so that this function does *either* IP activation or
master role activation but not both. So the IP will be activated only
once (from the master daemon or from LURenameCluster), and it will only
be done if the masterd got enough votes for startup.

I can see only one downside to this change: if masterd won't actually
start (due to missing votes), RAPI will still start, and without the
master IP activated. But this is no worse than before, when both RAPI
was running and the IP was activated.

Note that the behaviour of StopMaster remains the same, as noone else
does the IP removal.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

91492e57