Commits · 70c771f6312f90106e63d2a6ce816e49b6dde249 · itminedu / snf-ganeti

Jul 07, 2009

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

70c771f6

bootstrap: Don't leak file descriptor when generating SSL certificate · 88828491
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
88828491

Fix problem with EAGAIN on socket connection in clients · 6096ee13

Michael Hanselmann authored 15 years ago

If a user used ^Z to stop the program, poll() in socket.recv would return
EAGAIN due to SIGSTOP. This patch changes luxi.Transport.Recv to ignore EAGAIN.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6096ee13

Fix some typos · 5bbd3f7f

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5bbd3f7f

Jul 01, 2009

Increase maximum accepted size for a DRBD meta dev · 1dc10972

Iustin Pop authored 15 years ago


With the change to stripped LVs, the actual size of a meta device (which
is small) can be more than we expected (for non-stripped LVs). This
patch increases from 160MB to 1GB the accepted size, and updates the
comment with the rationale behind this change.

Note that we do want even meta devices stripped, since it can increase
metadata update.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

1dc10972

Jun 30, 2009

Cleanup config data when draining nodes · dec0d9da

Iustin Pop authored 15 years ago


Currently, when draining nodes we reset their master candidate flag, but
we don't instruct them to demote themselves. This leads to “ERROR: file
'/var/lib/ganeti/config.data' should not exist on non master candidates
(and the file is outdated)”.

This patch simply adds a call to node_demote_from_mc in this case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

dec0d9da

Fix node readd issues · a8ae3eb5

Iustin Pop authored 15 years ago


This patch fixes a few node readd issues.

Currently, the node readd consists of two opcodes:
  - OpSetNodeParms, which resets the offline/drained flags
  - OpAddNode (with readd=True), which reconfigures the node

The problem is that between these two, the configuration is inconsistent
for certain cluster configurations. Thus, this patch removes the first
opcode and modified the LUAddNode to deal with this case too.

The patch also modifies the computation of the intended master_candidate
status, and actually sets the readded node to master candidate if
needed. Previously, we didn't modify the existing node at all.

Finally, the patch modifies the bottom of the Exec() function for this
LU to:
  - trigger a node update, which in turn redistributes the ssconf files
    to all nodes (and thus the new node too)
  - if the new node is not a master candidate, then call the
    node_demote_from_mc RPC so that old master files are cleared

My testing shows this behaves correctly for various cases.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a8ae3eb5

backend.DemoteFromMC: don't fail for missing files · 9a5cb537

Iustin Pop authored 15 years ago

If the config file is missing when the DemoteFromMC() function is
called, it will raise a ProgrammerError. Instead of changing the
utils.CreateBackup() file which is called from multiple places, for now
we only change the DemoteFromMC() function to not call it if the file is
not existing (we rely on the master to prevent race conditions here).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

9a5cb537

Allow GetMasterCandidateStats to ignore some nodes · 23f06b2b

Iustin Pop authored 15 years ago


This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to
ignore some nodes in the calculation, so that we can use it to predict
cluster state without some nodes (which we know we will modify, and thus
we should not rely on their state).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

23f06b2b

Fix error message for extra files on non MC nodes · e631cb25

Iustin Pop authored 15 years ago


Currently the message for extraneous files on non master candidates is
confusing, to say the least. This makes it hopefully more clear.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

e631cb25

Jun 29, 2009

Fix adjustement of candidates in cluster modify · 75e914fb

Iustin Pop authored 15 years ago

The code for adjusting the candidate pool size was done after the config
update, and this means we triggered the save of the config file without
fixing the candidate pool, which aborts with an error.

The patch just moves it above. The old comment was valid, but we anyway
save the config file in MaintainCandidatePool, so this should be safe.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

75e914fb

Add a new node list field · c120ff34

Iustin Pop authored 15 years ago


This patch adds a ‘role’ node list field, which shows a one-character
node status. This is a simpler way to see the node status than selecting
all the flags individually.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c120ff34

Jun 23, 2009

Fix HTTP server library handling of credentials · 81b59aaf

Iustin Pop authored 15 years ago


Currently the http library only checks credentials when authentication
is required. This means that any credentials are accepted on the root
resource, for example, which makes problems hard to diagnose - the
user/pw works for all queries, until one tries to do a modification at
which point fails.

This patch changes the PreHandleRequest() function to not ignore
credentials when passed, even if we don't require authentication. This
makes the behavior of RAPI more predictable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

81b59aaf

Fix a typo in backend.InstanceReboot docstring · 73e5a4f4

Iustin Pop authored 15 years ago


The documentation for the reboot was wrong. This patch fixes it and
updates the docstring with more details.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

73e5a4f4

Jun 17, 2009

Fix handling of 'vcpus' in instance list · c1ce76bb

Iustin Pop authored 15 years ago


Currently running “gnt-instance list -o+vcpus” fails with a cryptic message:
  Unhandled Ganeti error: vcpus

This is due to multiple issues:
  - in some corner cases cmdlib.py raises an errors.ParameterError but
    this is not handled by cli.py
  - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle
    it, so instead of failing with unknown parameter, e.g.:
      Failure: prerequisites not met for this operation:
      Unknown output fields selected: vcpuscd
    it raises the ParameteError message

This patch:
  - adds handling of 'vcpus' to LUQueryInstances
  - adds handling of the ParameterError exception to cli.py
  - changes the 'else: raise errors.ParameterError' in the field handling of
    LUQueryInstance to an assert, since it's a programmer error if we reached
    this step

With this, a future unhandled parameter will show:
  gnt-instance list -o+vcpus
  Unhandled protocol error while talking to the master daemon:
  Caught exception: Declared but unhandled parameter 'vcpus'

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c1ce76bb

Fix checking for valid OS in instance create · 6dfad215

Iustin Pop authored 15 years ago


The current check in LUCreateInstance.CheckPrereq() is wrong - it only checks
if we got an OS, but not if we got a valid OS. This patch fixes it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6dfad215

Show disk size in instance info · c98162a7

Iustin Pop authored 15 years ago


The size of the instance's disk was not shown in “gnt-instance info”.
This patch adds it and formats it nicely if possible.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c98162a7

Jun 16, 2009

gnt-cluster(8) fix --backend-parameters opt name · 280b79b3

Guido Trotter authored 15 years ago


It was mistakenly called --backend

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

280b79b3

LUQueryInstances: fix querying for nic data · 39a02558

Guido Trotter authored 15 years ago


Currently we support querying for "mac" "ip" or "bridge", meaning "the
one of the first nic. We are not checking that there is a first nic,
though, and thus could incur in errors. This patch fixes it by returning
"None" should there be no such nic, as it's done when explicitely asking
for a nic via nic.<field>/<N>

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

39a02558

Specify the object type in two docstring · a2a24f4c

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a2a24f4c

Merge branch 'master' into next · c57f169e

Guido Trotter authored 15 years ago

* master:
  Update NEWS and version for 2.0.1 release
  gnt-{instance,backup}(8) --nic is actually --net
  Fix a wrong function name in backend.DrbdAttachNet
  GNT-CLUSTER(8) fix search-tags example

c57f169e

Update NEWS and version for 2.0.1 release · 0dea942c

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0dea942c

gnt-{instance,backup}(8) --nic is actually --net · 091c2c64

Guido Trotter authored 15 years ago


Fix a typo in the man pages that used the wrong option name.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

091c2c64

Jun 15, 2009

Fix a wrong function name in backend.DrbdAttachNet · c738375b

Iustin Pop authored 15 years ago


Commit cf8df3f3 "bdev: forward-port
ReAttachNet/DisconnectNet" forward-ported 1.2's bdev.DRBD8.ReAttachNet()
to 2.0 while renaming it to AttachNet(), but commit
6b93ec9d "Forward-port DrbdNetReconfig"
didn't rename all the calls to it and left one ReAttachNet call in
backend.py.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c738375b

Jun 11, 2009

GNT-CLUSTER(8) fix search-tags example · 2f49d1d2

Guido Trotter authored 15 years ago


Reported in issue 59.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f49d1d2

Jun 08, 2009

Enable stripped LVs · fecbe9d5

Iustin Pop authored 15 years ago


This patch enables stripped LVs, falling back to non-stripped if the
stripped creation fails. If the configure-time lvm-stripecount is 1,
this patch becomes a noop (with an insignificant python-level overhead,
but no extra lvm calls).

The effect of this patch is that new instances will get stripped LVs
from the start, whereas old instances will have their LVs stripped as
soon as replace-disks is run for them.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fecbe9d5

Add a lvm stripecount configure parameter · 3736cb6b

Iustin Pop authored 15 years ago


This patch adds a configure-time customizable parameter that will be
used to enable stripped LVs. The default of the parameter is 3.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

3736cb6b

Add more constants for DRBD and change sync tests · 3c003d9d

Iustin Pop authored 15 years ago


This patch adds constants for the connection status, peer roles and disk
status, and it changes the rules for when the disk is considered as
“resyncing” - previously it was only for syncsource/synctarget, but
there are many other transient statuses which could be misinterpreted as
‘degraded’ (because they where not considered as resyncing, but the disk
is not consistent in these statuses).

Furthermore, cmdlib.py:WaitForSync determines if a device is syncing or
not based on sync_percent being not none. Not all DRBD resync statuses
offer a percent done, so if we are syncing but don't have a sync
percent, we'll report a zero sync percent (and no time estimate).

The patch also removes a few unused variables (is_sync_target,
peer_sync_target, is_resync) whose value doesn't make sense anymore with
the new sync rules.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3c003d9d

Merge branch 'master' into next · 5ce92cd3

Iustin Pop authored 15 years ago

* master:
  Wait for a while in failed resyncs
  Fix two issues with exports and snapshot errors

5ce92cd3

Jun 04, 2009

Wait for a while in failed resyncs · fbafd7a8

Iustin Pop authored 15 years ago


This patch is an attempt at fixing some very rare occurrences of messages like:
  - "There are some degraded disks for this instance", or:
  - "Cannot resync disks on node node3.example.com: [True, 100]"

What I believe happens is that drbd has finished syncing, but not all
fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or
in some other transient state we don't handle well.

The patch will change the _WaitForSync method to recheck up to a
hardcoded number of times if we're finished syncing but we're degraded
(using the same condition as the 'break' clause of the loop).

The cons of this changes is that a normal, really-degraded due to
network or disk failure will cause an extra delay before it aborts. For
this, I'm happy to choose other values.

A better, long term fix is to handle more DRBD state correctly (see the
bdev.DRBD8Status class).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fbafd7a8

Jun 03, 2009

Assemble DRBD using the known size · f069addf

Iustin Pop authored 15 years ago


This patch changes DRBD disk attachment to force the wanted size, as opposed to
letting the device auto-discover its size.

This should make the disks more resilient with regard to small differences in
size (e.g. due to LVM rounding). This still works with regard to disk
growth, but the instances needs to be fully restarted (including disks)
in that case.

This passes a full burning without problems, but it's still a tricky
change - if the config.data is not synced with the reality, we might
tell DRBD a wrong size. At least this will fail outright (and not
introduce silent errors), as DRBD (per a quick check at the sources)
tracks the size in the meta-dev and also does not allow shrinking
consistent devices.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

f069addf

Fix two issues with exports and snapshot errors · a97da6b7

Iustin Pop authored 15 years ago


This patch fixes two issues related to failed snapshots during exports:
  - first, the error messages used disk.logical_id[1], which is a node
    name for DRBD, and it resulted in strange error messages like
    "cannot snapshot block device node1 on node2"
  - second, if snapshotting fails for any disk, rpc.call_finalize_export
    fails as it didn't handle booleans (backend.FinalizeExport does)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

a97da6b7

May 28, 2009

Set the size on new DRBDs in replace secondary · 8a6c7011

Iustin Pop authored 15 years ago


Currently the code in cmdlib doesn't set the device size to new DRBD
devices in replace secondary, but we need to do it otherwise it gets
initialized to None.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8a6c7011

Change the bdev init signatures · 464f8daf

Iustin Pop authored 15 years ago


This patch changes all the bdev.BlockDev constructors to take an
additional ‘size’ parameter, all the backend functions that call those
functions to pass it and also changes backend.BlocdevCreate() to not use
the size passed via the rpc call but instead directly disk.size (this is
the only way it's called).

Note that this patch doesn't do anything with this parameter, just
stores it on the blockdev objects.

With the patch, we actually have a more uniform init sequence (before
create had the parameter, but the other functions not).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

464f8daf

Merge branch 'next' · 2cd855dd

Iustin Pop authored 15 years ago


* next: (34 commits)
  watcher: automatically restart noded/rapi
  watcher: handle full and drained queue cases
  rapi: rework error handling
  Fix backend.OSEnvironment be/hv parameters
  rapi: make tags query not use jobs
  Change failover instance when instance is stopped
  Export more instance information in hooks
  watcher: write the instance status to a file
  Fix the SafeEncoding behaviour
  Move more hypervisor strings into constants
  Add -H/-B startup parameters to gnt-instance
  call_instance_start: add optional hv/be parameters
  Fix gnt-job list argument handling
  Instance reinstall: don't mix up errors
  Don't check memory at startup if instance is up
  gnt-cluster modify: fix --no-lvm-storage
  LUSetClusterParams: improve volume group removal
  gnt-cluster info: show more cluster parameters
  LUQueryClusterInfo: return a few more fields
  Add the new DRBD test files to the Makefile
  ...

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

2cd855dd

May 27, 2009

Release 2.0.0 final · 7a8994d4

Iustin Pop authored 15 years ago


This is simply a version bump, no changes from rc5.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7a8994d4

May 25, 2009

watcher: automatically restart noded/rapi · c4f0219c

Iustin Pop authored 15 years ago


This patch makes the watcher automatically restart the node and rapi
daemons, if they are not running (as per the PID file).

This is not an exhaustive test; a better one would be TCP connect to the
port, and an even better one a simple protocol ping (e.g. get / for rapi
and a rpc_call_alive for noded), but since we don't know how they've
been started we can't implement it today. rapi would need to write the
SSL/port to a file, and noded something similar, so that we know how to
connect.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c4f0219c

watcher: handle full and drained queue cases · 24edc6d4

Iustin Pop authored 15 years ago


Currently the watcher is broken when the queue is full, thus not
fulfilling its job as a queue cleaner. It also doesn't handle nicely the
queue drained status.

This patch does a few changes:
  - first archive jobs, and only after submit jobs; this fixes the case
    where the queue is already full and there are jobs suited for
    archiving (but not the case where the jobs all too young to be
    archived)
  - handle nicely the job queue full and drained cases—instead of
    tracebacks, log such cases nicely
  - reverse the initial value and special cases for update_file; we now
    whitelist instead of blacklist cases, since we have much more
    blacklist cases than vice versa, and we set the flag to True only
    after the run is successful

The last change, especially, is a significant one: now errors during the
watcher run will not update the status file, and thus they won't be lost
again in the logs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

24edc6d4

rapi: rework error handling · 59b4eeef

Iustin Pop authored 15 years ago


Currently the rapi code doesn't have any custom error handling; any
exceptions raised are simply converted into an HTTP 500 error, without
much explanation.

This patch adds a couple of generic SubmitJob/GetClient functions that
handle some errors specially so that they are transformed into HTTP
errors, with more detailed information.

With this patch, the behaviour of rapi when the queue is full or
drained, or when the master is down is more readable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

59b4eeef

Fix backend.OSEnvironment be/hv parameters · 030b218a

Iustin Pop authored 15 years ago


Commit 67fc3042 added some more
variables to be exported to OSEnvironment, but it has two bugs:
  - wrong variable name (env vs. result)
  - in OSEnvironment we don't have the automatic converstion to strings
    that we do in hooks, so we must manually enforce this

With this patch instance creations work again.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

030b218a