Commits · e3443b36a580a1eb5fdb1a34d46e501e82bd3da9 · itminedu / snf-ganeti

Aug 04, 2009

Add ignore size support in _AssembleInstanceDisks · e3443b36

Iustin Pop authored 15 years ago


This patch adds an optional parameter to _AssembleInstanceDisks that
allows ignoring of size information by making a copy of the disk
structure and setting the size to zero.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e3443b36

Add a objects.Disk.UnsetSize() method · a805ec18

Iustin Pop authored 15 years ago


This method recursively resets the size of the disk and its children to
zero.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a805ec18

bdev: allow ignoring of size in Assemble() · 60bca04a

Iustin Pop authored 15 years ago


This patch changes the DRBD8 class (the only one to use the size in
Assemble) to ignore the size in Assemble when a zero size is passed.
This will allow activation of disks even when the size recorded in the
configuration is wrong.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

60bca04a

Fix instance import net option · dc922da0

Iustin Pop authored 15 years ago


This is identical to dc30b0e4 but applied to gnt-backup. Thanks to user
ocaner for catching it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

dc922da0

Simplify the devel/upload script · c5159571

Iustin Pop authored 15 years ago


Instead of multiple uploads to each node, this script copies everything
as needed to the temporary directory, exactly as to be installed in the
destination machine, then runs only one rsync per host.

This is more dangerous (we can break /etc now), but for development
machines is fine.

The patch then also uploads the bash completions and the current name
for the cron job (I think that ganeti-master-cron is a deprecated name,
not that someone actually intends to upload a file named like that). A
flag --no-cron is added to skip uploading the cron file if desired.

The patch also changes rsync to propagate the file permissions.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c5159571

Aug 03, 2009

Add a Copy method to object.ConfigObject · e8d563f3

Iustin Pop authored 15 years ago


This small patch adds a simple Copy method that is can be used for
'throw-away' copies of objects.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e8d563f3

Jul 29, 2009

Extend call_node_start_master rpc with no_voting · 2503680f

Guido Trotter authored 15 years ago


When the parameter is set to True and start_daemons is also True,
ganeti-masterd will be started with the new --no-voting --yes-do-it
options.

This new option is set to True only on masterfailover, when no_voting is
used. This changed the behavior from 2.0, where we didn't start the
master daemon at all, when this option was used.

The manpage is also updated to remove the 2.0 only change.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2503680f

Jul 17, 2009

Update NEWS and version for 2.0.2 release · 550a995a

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

550a995a

Jul 16, 2009

Improve the description of node flags in man page · 253ba78f

Raiford Storey authored 15 years ago


[iustin@google.com: slightly reworded the explanation for offline and
changed the commit message]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

253ba78f

Change default stripe count to 1 · 7b3ac94d

Iustin Pop authored 15 years ago


In order not to change the default during a stable series, we modify
configure.ac to default to one stripe, in effect keeping the status quo
(well, minus the LVM Attach() changes).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

7b3ac94d

Use full-stripe size in LVM growth · 38256320

Iustin Pop authored 15 years ago


LVM has issues when growing stripped volumes, so it's best to specify
the growth in exact multiples of the full stripe size (as precise as
possible). For this we need to do a couple of changes:
  - in LVM Attach(), we query additionally the VG extent size and the LV
    stripe count; since this makes lvs return a (possibly) multi-line
    output, we now split it into lines and only take the last one
  - in LVM Grow(), we round up the increase in multiples of the full
    stripe size

The patch also sets the correct target size in DRBD growth.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

38256320

Jul 13, 2009

RAPI: implement instance reinstall · e5b7c4ca

Iustin Pop authored 15 years ago


This patch adds instance reinstall to RAPI, with two optional parameters:
  - ‘os', in order to change the OS on reinstall
  - ‘nostartup’, in order to leave the instance down after reinstall

The call will first shutdown the instance, the reinstall it, and unless
‘nostartup’ has been passed and is equal to 1, it will be started
automatically.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

e5b7c4ca

Jul 08, 2009

Create a new --no-voting option for masterfailover · 8e2524c3

Guido Trotter authored 15 years ago


This allows failing over in certain corner cases, such as a 2 node
cluster with one node down. The man page is also updated to document
this dangerous option and how to recover from this situation.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8e2524c3

ganeti-masterd: allow non-interactive --no-voting · 5e96d216

Guido Trotter authored 15 years ago


This will be used by ganeti-noded to start ganeti-masterd in a
--no-voting masterfailover.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5e96d216

Jul 01, 2009

Increase maximum accepted size for a DRBD meta dev · 1dc10972

Iustin Pop authored 15 years ago


With the change to stripped LVs, the actual size of a meta device (which
is small) can be more than we expected (for non-stripped LVs). This
patch increases from 160MB to 1GB the accepted size, and updates the
comment with the rationale behind this change.

Note that we do want even meta devices stripped, since it can increase
metadata update.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

1dc10972

Jun 30, 2009

Cleanup config data when draining nodes · dec0d9da

Iustin Pop authored 15 years ago


Currently, when draining nodes we reset their master candidate flag, but
we don't instruct them to demote themselves. This leads to “ERROR: file
'/var/lib/ganeti/config.data' should not exist on non master candidates
(and the file is outdated)”.

This patch simply adds a call to node_demote_from_mc in this case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

dec0d9da

Fix node readd issues · a8ae3eb5

Iustin Pop authored 15 years ago


This patch fixes a few node readd issues.

Currently, the node readd consists of two opcodes:
  - OpSetNodeParms, which resets the offline/drained flags
  - OpAddNode (with readd=True), which reconfigures the node

The problem is that between these two, the configuration is inconsistent
for certain cluster configurations. Thus, this patch removes the first
opcode and modified the LUAddNode to deal with this case too.

The patch also modifies the computation of the intended master_candidate
status, and actually sets the readded node to master candidate if
needed. Previously, we didn't modify the existing node at all.

Finally, the patch modifies the bottom of the Exec() function for this
LU to:
  - trigger a node update, which in turn redistributes the ssconf files
    to all nodes (and thus the new node too)
  - if the new node is not a master candidate, then call the
    node_demote_from_mc RPC so that old master files are cleared

My testing shows this behaves correctly for various cases.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a8ae3eb5

backend.DemoteFromMC: don't fail for missing files · 9a5cb537

Iustin Pop authored 15 years ago

If the config file is missing when the DemoteFromMC() function is
called, it will raise a ProgrammerError. Instead of changing the
utils.CreateBackup() file which is called from multiple places, for now
we only change the DemoteFromMC() function to not call it if the file is
not existing (we rely on the master to prevent race conditions here).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

9a5cb537

Allow GetMasterCandidateStats to ignore some nodes · 23f06b2b

Iustin Pop authored 15 years ago


This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to
ignore some nodes in the calculation, so that we can use it to predict
cluster state without some nodes (which we know we will modify, and thus
we should not rely on their state).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

23f06b2b

Fix error message for extra files on non MC nodes · e631cb25

Iustin Pop authored 15 years ago


Currently the message for extraneous files on non master candidates is
confusing, to say the least. This makes it hopefully more clear.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

e631cb25

Jun 29, 2009

Fix adjustement of candidates in cluster modify · 75e914fb

Iustin Pop authored 15 years ago

The code for adjusting the candidate pool size was done after the config
update, and this means we triggered the save of the config file without
fixing the candidate pool, which aborts with an error.

The patch just moves it above. The old comment was valid, but we anyway
save the config file in MaintainCandidatePool, so this should be safe.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

75e914fb

Add a new node list field · c120ff34

Iustin Pop authored 15 years ago


This patch adds a ‘role’ node list field, which shows a one-character
node status. This is a simpler way to see the node status than selecting
all the flags individually.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c120ff34

Jun 23, 2009

Fix HTTP server library handling of credentials · 81b59aaf

Iustin Pop authored 15 years ago


Currently the http library only checks credentials when authentication
is required. This means that any credentials are accepted on the root
resource, for example, which makes problems hard to diagnose - the
user/pw works for all queries, until one tries to do a modification at
which point fails.

This patch changes the PreHandleRequest() function to not ignore
credentials when passed, even if we don't require authentication. This
makes the behavior of RAPI more predictable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

81b59aaf

Fix a typo in backend.InstanceReboot docstring · 73e5a4f4

Iustin Pop authored 15 years ago


The documentation for the reboot was wrong. This patch fixes it and
updates the docstring with more details.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

73e5a4f4

Jun 17, 2009

Fix handling of 'vcpus' in instance list · c1ce76bb

Iustin Pop authored 15 years ago


Currently running “gnt-instance list -o+vcpus” fails with a cryptic message:
  Unhandled Ganeti error: vcpus

This is due to multiple issues:
  - in some corner cases cmdlib.py raises an errors.ParameterError but
    this is not handled by cli.py
  - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle
    it, so instead of failing with unknown parameter, e.g.:
      Failure: prerequisites not met for this operation:
      Unknown output fields selected: vcpuscd
    it raises the ParameteError message

This patch:
  - adds handling of 'vcpus' to LUQueryInstances
  - adds handling of the ParameterError exception to cli.py
  - changes the 'else: raise errors.ParameterError' in the field handling of
    LUQueryInstance to an assert, since it's a programmer error if we reached
    this step

With this, a future unhandled parameter will show:
  gnt-instance list -o+vcpus
  Unhandled protocol error while talking to the master daemon:
  Caught exception: Declared but unhandled parameter 'vcpus'

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c1ce76bb

Fix checking for valid OS in instance create · 6dfad215

Iustin Pop authored 15 years ago


The current check in LUCreateInstance.CheckPrereq() is wrong - it only checks
if we got an OS, but not if we got a valid OS. This patch fixes it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6dfad215

Show disk size in instance info · c98162a7

Iustin Pop authored 15 years ago


The size of the instance's disk was not shown in “gnt-instance info”.
This patch adds it and formats it nicely if possible.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c98162a7

Jun 16, 2009

gnt-cluster(8) fix --backend-parameters opt name · 280b79b3

Guido Trotter authored 15 years ago


It was mistakenly called --backend

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

280b79b3

LUQueryInstances: fix querying for nic data · 39a02558

Guido Trotter authored 15 years ago


Currently we support querying for "mac" "ip" or "bridge", meaning "the
one of the first nic. We are not checking that there is a first nic,
though, and thus could incur in errors. This patch fixes it by returning
"None" should there be no such nic, as it's done when explicitely asking
for a nic via nic.<field>/<N>

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

39a02558

Specify the object type in two docstring · a2a24f4c

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a2a24f4c

Merge branch 'master' into next · c57f169e

Guido Trotter authored 15 years ago

* master:
  Update NEWS and version for 2.0.1 release
  gnt-{instance,backup}(8) --nic is actually --net
  Fix a wrong function name in backend.DrbdAttachNet
  GNT-CLUSTER(8) fix search-tags example

c57f169e

Update NEWS and version for 2.0.1 release · 0dea942c

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0dea942c

gnt-{instance,backup}(8) --nic is actually --net · 091c2c64

Guido Trotter authored 15 years ago


Fix a typo in the man pages that used the wrong option name.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

091c2c64

Jun 15, 2009

Fix a wrong function name in backend.DrbdAttachNet · c738375b

Iustin Pop authored 15 years ago


Commit cf8df3f3 "bdev: forward-port
ReAttachNet/DisconnectNet" forward-ported 1.2's bdev.DRBD8.ReAttachNet()
to 2.0 while renaming it to AttachNet(), but commit
6b93ec9d "Forward-port DrbdNetReconfig"
didn't rename all the calls to it and left one ReAttachNet call in
backend.py.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c738375b

Jun 11, 2009

GNT-CLUSTER(8) fix search-tags example · 2f49d1d2

Guido Trotter authored 15 years ago


Reported in issue 59.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f49d1d2

Jun 08, 2009

Enable stripped LVs · fecbe9d5

Iustin Pop authored 15 years ago


This patch enables stripped LVs, falling back to non-stripped if the
stripped creation fails. If the configure-time lvm-stripecount is 1,
this patch becomes a noop (with an insignificant python-level overhead,
but no extra lvm calls).

The effect of this patch is that new instances will get stripped LVs
from the start, whereas old instances will have their LVs stripped as
soon as replace-disks is run for them.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fecbe9d5

Add a lvm stripecount configure parameter · 3736cb6b

Iustin Pop authored 15 years ago


This patch adds a configure-time customizable parameter that will be
used to enable stripped LVs. The default of the parameter is 3.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

3736cb6b

Add more constants for DRBD and change sync tests · 3c003d9d

Iustin Pop authored 15 years ago


This patch adds constants for the connection status, peer roles and disk
status, and it changes the rules for when the disk is considered as
“resyncing” - previously it was only for syncsource/synctarget, but
there are many other transient statuses which could be misinterpreted as
‘degraded’ (because they where not considered as resyncing, but the disk
is not consistent in these statuses).

Furthermore, cmdlib.py:WaitForSync determines if a device is syncing or
not based on sync_percent being not none. Not all DRBD resync statuses
offer a percent done, so if we are syncing but don't have a sync
percent, we'll report a zero sync percent (and no time estimate).

The patch also removes a few unused variables (is_sync_target,
peer_sync_target, is_resync) whose value doesn't make sense anymore with
the new sync rules.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3c003d9d

Merge branch 'master' into next · 5ce92cd3

Iustin Pop authored 15 years ago

* master:
  Wait for a while in failed resyncs
  Fix two issues with exports and snapshot errors

5ce92cd3

Jun 04, 2009

Wait for a while in failed resyncs · fbafd7a8

Iustin Pop authored 15 years ago


This patch is an attempt at fixing some very rare occurrences of messages like:
  - "There are some degraded disks for this instance", or:
  - "Cannot resync disks on node node3.example.com: [True, 100]"

What I believe happens is that drbd has finished syncing, but not all
fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or
in some other transient state we don't handle well.

The patch will change the _WaitForSync method to recheck up to a
hardcoded number of times if we're finished syncing but we're degraded
(using the same condition as the 'break' clause of the loop).

The cons of this changes is that a normal, really-degraded due to
network or disk failure will cause an extra delay before it aborts. For
this, I'm happy to choose other values.

A better, long term fix is to handle more DRBD state correctly (see the
bdev.DRBD8Status class).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fbafd7a8