Commits · 0dea942cf492a34c738e12c227f7741c8c895e67 · itminedu / snf-ganeti

Jun 16, 2009

Update NEWS and version for 2.0.1 release · 0dea942c

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0dea942c

gnt-{instance,backup}(8) --nic is actually --net · 091c2c64

Guido Trotter authored 15 years ago


Fix a typo in the man pages that used the wrong option name.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

091c2c64

Jun 15, 2009

Fix a wrong function name in backend.DrbdAttachNet · c738375b

Iustin Pop authored 15 years ago


Commit cf8df3f3 "bdev: forward-port
ReAttachNet/DisconnectNet" forward-ported 1.2's bdev.DRBD8.ReAttachNet()
to 2.0 while renaming it to AttachNet(), but commit
6b93ec9d "Forward-port DrbdNetReconfig"
didn't rename all the calls to it and left one ReAttachNet call in
backend.py.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c738375b

Jun 11, 2009

GNT-CLUSTER(8) fix search-tags example · 2f49d1d2

Guido Trotter authored 15 years ago


Reported in issue 59.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2f49d1d2

Jun 04, 2009

Wait for a while in failed resyncs · fbafd7a8

Iustin Pop authored 15 years ago


This patch is an attempt at fixing some very rare occurrences of messages like:
  - "There are some degraded disks for this instance", or:
  - "Cannot resync disks on node node3.example.com: [True, 100]"

What I believe happens is that drbd has finished syncing, but not all
fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or
in some other transient state we don't handle well.

The patch will change the _WaitForSync method to recheck up to a
hardcoded number of times if we're finished syncing but we're degraded
(using the same condition as the 'break' clause of the loop).

The cons of this changes is that a normal, really-degraded due to
network or disk failure will cause an extra delay before it aborts. For
this, I'm happy to choose other values.

A better, long term fix is to handle more DRBD state correctly (see the
bdev.DRBD8Status class).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

fbafd7a8

Jun 03, 2009

Fix two issues with exports and snapshot errors · a97da6b7

Iustin Pop authored 15 years ago


This patch fixes two issues related to failed snapshots during exports:
  - first, the error messages used disk.logical_id[1], which is a node
    name for DRBD, and it resulted in strange error messages like
    "cannot snapshot block device node1 on node2"
  - second, if snapshotting fails for any disk, rpc.call_finalize_export
    fails as it didn't handle booleans (backend.FinalizeExport does)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

a97da6b7

May 28, 2009

Merge branch 'next' · 2cd855dd

Iustin Pop authored 15 years ago


* next: (34 commits)
  watcher: automatically restart noded/rapi
  watcher: handle full and drained queue cases
  rapi: rework error handling
  Fix backend.OSEnvironment be/hv parameters
  rapi: make tags query not use jobs
  Change failover instance when instance is stopped
  Export more instance information in hooks
  watcher: write the instance status to a file
  Fix the SafeEncoding behaviour
  Move more hypervisor strings into constants
  Add -H/-B startup parameters to gnt-instance
  call_instance_start: add optional hv/be parameters
  Fix gnt-job list argument handling
  Instance reinstall: don't mix up errors
  Don't check memory at startup if instance is up
  gnt-cluster modify: fix --no-lvm-storage
  LUSetClusterParams: improve volume group removal
  gnt-cluster info: show more cluster parameters
  LUQueryClusterInfo: return a few more fields
  Add the new DRBD test files to the Makefile
  ...

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

2cd855dd

May 27, 2009

Release 2.0.0 final · 7a8994d4

Iustin Pop authored 15 years ago


This is simply a version bump, no changes from rc5.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7a8994d4

May 25, 2009

watcher: automatically restart noded/rapi · c4f0219c

Iustin Pop authored 15 years ago


This patch makes the watcher automatically restart the node and rapi
daemons, if they are not running (as per the PID file).

This is not an exhaustive test; a better one would be TCP connect to the
port, and an even better one a simple protocol ping (e.g. get / for rapi
and a rpc_call_alive for noded), but since we don't know how they've
been started we can't implement it today. rapi would need to write the
SSL/port to a file, and noded something similar, so that we know how to
connect.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

c4f0219c

watcher: handle full and drained queue cases · 24edc6d4

Iustin Pop authored 15 years ago


Currently the watcher is broken when the queue is full, thus not
fulfilling its job as a queue cleaner. It also doesn't handle nicely the
queue drained status.

This patch does a few changes:
  - first archive jobs, and only after submit jobs; this fixes the case
    where the queue is already full and there are jobs suited for
    archiving (but not the case where the jobs all too young to be
    archived)
  - handle nicely the job queue full and drained cases—instead of
    tracebacks, log such cases nicely
  - reverse the initial value and special cases for update_file; we now
    whitelist instead of blacklist cases, since we have much more
    blacklist cases than vice versa, and we set the flag to True only
    after the run is successful

The last change, especially, is a significant one: now errors during the
watcher run will not update the status file, and thus they won't be lost
again in the logs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

24edc6d4

rapi: rework error handling · 59b4eeef

Iustin Pop authored 15 years ago


Currently the rapi code doesn't have any custom error handling; any
exceptions raised are simply converted into an HTTP 500 error, without
much explanation.

This patch adds a couple of generic SubmitJob/GetClient functions that
handle some errors specially so that they are transformed into HTTP
errors, with more detailed information.

With this patch, the behaviour of rapi when the queue is full or
drained, or when the master is down is more readable.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

59b4eeef

Fix backend.OSEnvironment be/hv parameters · 030b218a

Iustin Pop authored 15 years ago


Commit 67fc3042 added some more
variables to be exported to OSEnvironment, but it has two bugs:
  - wrong variable name (env vs. result)
  - in OSEnvironment we don't have the automatic converstion to strings
    that we do in hooks, so we must manually enforce this

With this patch instance creations work again.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

030b218a

rapi: make tags query not use jobs · 25e39bfa

Iustin Pop authored 15 years ago


Currently the rapi tags query implementation is similar to the command
line one: it submits OpGetTags jobs. This not good, since this being an
API it can be used a lot and can pollute the job queue with many such
trivial jobs.

This patch converts it to use either queries (for nodes/instances) or
direct read from ssconf (for the cluster case). For ssconf, we added a
function to the ssconf.SimpleStore class for reading the tags.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

25e39bfa

May 21, 2009

Change failover instance when instance is stopped · d27776f0

Iustin Pop authored 15 years ago


Currently, if the instance is stopped, we still check for enough memory
on the target node. This is a little bit too strict, since in case too
many nodes have failed and one is out of the memory, this prevents
fixing the cluster (with the instances down).

We change it to do the memory checks only when the instance will be
started.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d27776f0

Export more instance information in hooks · 67fc3042

Iustin Pop authored 15 years ago


Currently we miss in hooks the instance's hypervisor, hypervisor
parameters and backend parameters. This forces hooks to query back into
ganeti, which is dangerous due to possible luxi sockets exhaustion.

This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*,
INSTANCE_BE_*. The hook environment prefixes all keys with “GANETI”, so
a default settings for a xen-pvm instance would be:

  GANETI_INSTANCE_HV_initrd_path=
  GANETI_INSTANCE_HV_kernel_args=ro
  GANETI_INSTANCE_HV_kernel_path=/boot/vmlinuz-2.6-xenU
  GANETI_INSTANCE_HV_root_path=/dev/sda1

Any dashes in parameter names are changed to underscores, since
variables with dashes are not easy to access from the shell
(alternatively we could deny those via an unittest for constants.py).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

67fc3042

May 20, 2009

Merge branch 'master' into next · f226f085
Guido Trotter authored 15 years ago
```
Signed-off-by: Guido Trotter <ultrotter@google.com>
```
f226f085

watcher: write the instance status to a file · 78f44650

Iustin Pop authored 15 years ago


This patch modifies the watcher to keep on-disk a file with the instance
status; this can be used from outside of ganeti to react to instances
being down (when the watcher cannot restart them).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

78f44650

Release 2.0rc5 · b926bd98

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b926bd98

May 19, 2009

Fix the SafeEncoding behaviour · d392fa34

Iustin Pop authored 15 years ago


Currently we have bad behaviour in SafeEncode:
  - binary strings are actually not handled correctly (ahem)
  - the encoding is not stable, due to use of string_escape

For this reason, we replace the use of string_escape with part of the
code of string escape (PyString_Repr in Objects/stringobject.c); we
don't escape backslashes or single quotes, since that is that makes it
nonstable. Furthermore, we only use the encode('ascii', ...) for unicode
inputs.

The patch also adds unittests for the function that test basic
behaviour.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d392fa34

Move more hypervisor strings into constants · 835528af

Iustin Pop authored 15 years ago


This patch adds constants for the mouse and boot order strings; while
there are still some issues remaining, we're trying to cleanup hardcoded
strings from the hypervisors.

Since the formatting of frozensets is currently wrong, we also add an
utility function for this and change all the error messages to use it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

835528af

watcher: try to restart the master if down · 7dfb83c2

Iustin Pop authored 15 years ago


Bugs in either our code or in associated libraries can bring the master daemon
down, and this (due to the 2.0 architecture) stops all work on the cluster.

Since the watcher already does periodic checks on the cluster, we modify
it to try to start the master automatically in case of failures to
connect. This will be tried only once per cycle.

Also, in this case, we modify the code so that the watcher status file
is not updated - its timestamp will reflect thus the time of last
successful connection to the master.

Side note: the except errors.ConfigurationError part could be cleaned
up, since in 2.0 we don't usually get that directly, and if we do it's
an error and we shouldn't touch the file anyway; but that is not a rc5
change.

Signed-off-by: Iustin Pop <iustin@google.com>

7dfb83c2

IAllocator: export total disk size for instances · 88ae4f85

Iustin Pop authored 15 years ago


This patch adds for current instance a ‘disk_space_total’ key, similar
to the key for the new instance in case of new allocations.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

88ae4f85

Add -H/-B startup parameters to gnt-instance · d04aaa2f

Iustin Pop authored 15 years ago


This patch modifies the start instance script, opcode and logical unit
to support temporary startup parameters.

Different from 1.2, where only the kernel arguments were supporting
changes (and thus xen-pvm specific), this version supports changing all
hypervisor and backend parameters (with appropriate checks).

This is much more flexible, and allows for example:
  - start with different, temporary kernel
  - start with different memory size

Note: in later versions, this should be extended to cover disk
parameters as well (e.g. start with drbd without flushes, start with
drbd in async mode, etc.).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d04aaa2f

call_instance_start: add optional hv/be parameters · 0eca8e0c

Iustin Pop authored 15 years ago


This patch modifies the rpc.call_instance_start - the master side - to
take optional hv/be parameters. The noded side is unchanged and
oblivious to the change.

This will allow implementation of single-user capability and such on
startup (temporary, as opposed to permanent).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

0eca8e0c

May 18, 2009

Fix gnt-job list argument handling · dcbd6288

Guido Trotter authored 15 years ago


Currently QueryJob returns "None" when a wrong job ID is passed.
Handle this in gnt-job list, by printing an error for each wrong job,
and still giving output for all the jobs which actually do exist.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dcbd6288

May 15, 2009
- Instance reinstall: don't mix up errors · b4874c9e
  Guido Trotter authored 15 years ago
```
If the remote info rpc call fails we can't assume that the instance is
up.

Signed-off-by: Guido Trotter <ultrotter@google.com>
```
  b4874c9e
- Don't check memory at startup if instance is up · f1926756
  Guido Trotter authored 15 years ago
```
Signed-off-by: Guido Trotter <ultrotter@google.com>
```
  f1926756
May 13, 2009

gnt-cluster modify: fix --no-lvm-storage · b8a8fbe1

Guido Trotter authored 15 years ago


Currently doing a gnt-cluster-modify --no-lvm-storage is silently
ignored, as it passes a None value in vg_name, which is the same as not
modifying that parameter. Explicitely set the passed value to '', so the
non-true not-None value can be evaluate to actually remove a volume
group.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b8a8fbe1

LUSetClusterParams: improve volume group removal · b2482333

Guido Trotter authored 15 years ago


Currently LUSetClusterParams will remove the volume group if the vg_name
field passed in is not true, but not None. Setting the target volume
group to False or the empty string, though, is a bad idea because it's
not a boolean value, and at cluster init we set it to None if
--no-lvm-storage is passed. With this fix we handle '' (or any other
non-None false value) as the "unset" value, but actually store None in
the config.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b2482333

gnt-cluster info: show more cluster parameters · a8001106

Guido Trotter authored 15 years ago


Even if we cannot modify all of them, they are useful information about
the current cluster.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a8001106

LUQueryClusterInfo: return a few more fields · 7a56b411

Guido Trotter authored 15 years ago


Some fields can be set at cluster init, and perhaps even modifed with
SetClusterParams but there's no way to know them. With this patch we
export them in the cluster info query.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7a56b411

May 12, 2009

KVMHypervisor: return memory and cpus as integers · 2a7e887b

Guido Trotter authored 15 years ago


Currently the KVM hypervisor returns strings for the memory and cpu
values, while the xen hypervisor returns integers. Making this uniform
converting the values to integers in KVM as well.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2a7e887b

LUSetInstanceParam: don't assume memory is integer · ade0e8cd

Guido Trotter authored 15 years ago


LUSetInstanceParam currently assumes that the 'memory' value of a
call_instance_info result is an integer, while the rest of the code
explicitely converts it to int(). Converting it to int works around a
bug which prevents changing the memory allocation of a live instance if
the remote call returns the memory in string format.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ade0e8cd

May 11, 2009

Add the new DRBD test files to the Makefile · d816408a

Iustin Pop authored 15 years ago


These were forgotten in commit 01e2ce3a,
and caused “make distcheck” to fail.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d816408a

Fix QA and documentation about no initrd case · 5645d16b

Iustin Pop authored 15 years ago


In Ganeti 1.2, “none” was used to signify no initrd. In 2.0 we have
changed to “no_” as a prefix (i.e. “-H no_initrd_path”) and thus we
document in the manpage this.

The QA suite is changed accordingly.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

5645d16b

Remove an unused function · dad1e806

Iustin Pop authored 15 years ago


The _TransformPath function is not used anymore in 2.0, let's remove it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

dad1e806

Exporting the instance network_port on the RAPI · a8b16c4e

Tim Boring authored 15 years ago


Patch for adding network_port to the instance attributes exported by the
RAPI.

[iustin@google.com: slightly changed the formatting]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a8b16c4e

May 09, 2009

Minor patch to rapi documentation · 4fb301b5

Tim Boring authored 15 years ago


Minor patch to clarify the URL necessary for accessing the RAPI.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4fb301b5

Small doc change in README · 342046f4

Iustin Pop authored 15 years ago


The version is 2.0, and we don't build PDFs by default, only HTML
files.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

342046f4

May 07, 2009

Remove some superfluous imports · bd45767b

Carlos Valiente authored 15 years ago


This is for Python 2.6 compatibility.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bd45767b