Commits · 35dd762d89851fef7f6141ee3a5a00e87431a783 · itminedu / snf-ganeti

Jan 05, 2011

Import upgrade notes into documentation · 35dd762d

Michael Hanselmann authored 14 years ago

This patch formats the upgrade notes currently in the wiki[1] as reST
and adds them to the documentation.

[1] http://code.google.com/p/ganeti/wiki/UpgradeNotes



Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

35dd762d

Dec 31, 2010

Fix typo in gnt-instance manpage · ab737f24

Michael Hanselmann authored 14 years ago


s/os-name/os-type/. This was reported in issue 133.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ab737f24

Dec 29, 2010

jqueue: Fix cancelling while in waitlock in queue · 30c945d0

Michael Hanselmann authored 14 years ago


Since the recent change to leave jobs in the “waitlock” status (commit
5fd6b694), cancelling a job while it's back in the queue would break.
This patch handles these cases and adds a unittest.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

30c945d0

Dec 20, 2010

cli: Extend message for LUXI timeouts · cd4c86a8

Michael Hanselmann authored 14 years ago


Point out that jobs already submitted continue to run.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cd4c86a8

Fix timeout handling in LUXI client · 28e3e216

Michael Hanselmann authored 14 years ago


If the socket can't be read in time, it raises “socket.timeout”, for
which there is special handling code. Unfortunately the exception block
was in the wrong order and “socket.error” caught it before.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

28e3e216

Merge branch 'stable-2.3' into devel-2.3 · 43217ac7

Michael Hanselmann authored 14 years ago


* stable-2.3:
  Prepare 2.3.1 release
  Fix disk status verification in LUClusterVerify

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

43217ac7

Prepare 2.3.1 release · bb2dc35a

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bb2dc35a

Dec 17, 2010

QA: Run cluster-verify as part of all instance tests · d27150a9

Michael Hanselmann authored 14 years ago


“gnt-cluster verify” looks at some per-instance information as well, so
it should be run for each instance type QA tests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d27150a9

QA: Fix typo and add “not” · 65924a12

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

65924a12

Dec 16, 2010

ensure-dirs: Speed up when using big queues · 196d70fa

Michael Hanselmann authored 14 years ago


The “ensure-dirs” script as included in Ganeti 2.3 is very slow when
working with big queues requiring a change of permissions on many or all
files.

$ find /var/lib/ganeti/queue/ | wc -l
52354

Before this change:
$ time /usr/local/lib/ganeti/ensure-dirs -f
real    16m4.739s

While not adressed in this patch, I'd like to record the overall
ineffiency of the “ensure-dirs” script, even after this change:

$ time /usr/local/lib/ganeti/ensure-dirs -f
real    5m57.362s
[…]
$ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.08    5.147090          49    104774           clone
 49.92    5.131094          49    104739           execve

More changes will be needed. Just for comparision, a small Python
snippet changing permissions on all files (“ensure-dirs” changes the
owner too):

$ time python -c 'import os; from ganeti import utils;
[os.chmod(i, 0644) for i in
utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]'
real    0m0.605s
[…]

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

196d70fa

Dec 15, 2010

Fix gnt-cluster verify with diskless instances · 4f5c2533

Adeodato Simo authored 14 years ago


`gnt-cluster verify` was failing with KeyError if there was any
diskless instance in the cluster. This was because _CollectDiskInfo()
was not including these instances in the returned dictionary, but they
were expected to be present in LUVerifyCluster.Exec().

With this commit, we ensure that the dictionary returned by _CollectDiskInfo
includes entries for diskless instances as well.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4f5c2533

jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694

Michael Hanselmann authored 14 years ago


Iustin Pop reported that a job's file is updated many times while it
waits for locks held by other thread(s). After an investigation it was
concluded that the reason was a design decision for job priorities to
return jobs to the “queued” status if they couldn't acquire all locks.
Changing a jobs' status or priority requires an update to permanent
storage.

In a high-level view this is what happens:
1. Mark as waitlock
2. Write to disk as permanent storage (jobs left in this state by a
   crashing master daemon are resumed on restart)
3. Wait for lock (assume lock is held by another thread)
4. Mark as queued
5. Write to disk again
6. Return to workerpool

Another option originally discussed was to leave the job in the
“waitlock” status. Ignoring priority changes, this is what would happen:
1. If not in waitlock
1.1. Assert state == queued
1.2. Mark as waitlock
1.3. Set start_timestamp
1.4. Write to disk as permanent storage
3. Wait for locks (assume lock is held by another thread)
4. Leave in waitlock
5. Return to workerpool

Now let's assume the lock is released by the other thread:
[…]
3. Wait for locks and get them
4. Assert state == waitlock
5. Set state to running
6. Set exec_timestamp
7. Write to disk

As this change reduces the number of writes from two per lock acquire
attempt to two per opcode and one per priority increase (as happens
after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
the highest priority is reached), here's the patch to implement it.
Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5fd6b694

Improve jqueue unittests · ebb2a2a3

Michael Hanselmann authored 14 years ago


- Verify job file updates
- Ensure queue lock is released while executing opcode

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ebb2a2a3

Dec 14, 2010

Update manpages to display version 2.3 · e7441f80

Miguel Di Ciurcio Filho authored 14 years ago


Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e7441f80

Dec 09, 2010

Fix disk status verification in LUClusterVerify · d41d07d4

Iustin Pop authored 14 years ago


Commit b8d26c6e added disk status verification, but it has two
(different) bugs for not healthy nodes.

For offline nodes, we don't add at all the disk status to the
instance/node dict, with the result that the instance is not present in
the instdisk dict if all of its nodes are offline. This creates a
KeyError later when we call VerifyInstance with instdisk[instance].

For online nodes, but which don't return a valid disk status, we simply
set the status to None for each disk, but the code in _VerifyInstance
presumes and requires that each status is a valid tuple of length two.

For both these bugs, we redo the instdisk computations to always include
valid data, and we enhance the asserts to check for consistency.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

d41d07d4

Merge branch 'devel-2.2' into devel-2.3 · d1a0ab50

Guido Trotter authored 14 years ago


* devel-2.2:
  Fix rename for file-backed instances

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d1a0ab50

Merge branch 'stable-2.2' into devel-2.2 · be9f4904

Guido Trotter authored 14 years ago


* stable-2.2:
  Fix rename for file-backed instances

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

be9f4904

Merge branch 'stable-2.2' into stable-2.3 · 5500ca1f

Guido Trotter authored 14 years ago


* stable-2.2:
  Fix rename for file-backed instances

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5500ca1f

Fix rename for file-backed instances · 3721d2fe

Guido Trotter authored 14 years ago


Currently the code wrongly changes the disk logical/physical id
component representing the path from "$storage_dir/$iname/disk$seq" to
"$storage_dir/$iname/disk/$seq" (note the additional slash) breaking the
rename.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3721d2fe

Dec 02, 2010

Merge branch 'stable-2.3' into devel-2.3 · 9a91d357

Michael Hanselmann authored 14 years ago


* stable-2.3:
  Bump version for 2.3.1~rc1 release

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9a91d357

Dec 01, 2010

locking: Clarify message for removed locks · e1137eb6

Michael Hanselmann authored 14 years ago


Just being told that a lock doesn't exist can be confusing. One case
were this happens is when a job (e.g. instance modify) waits for a job
removing the instance (e.g. export with remove).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e1137eb6

Bump version for 2.3.1~rc1 release · 563d5e72

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

563d5e72

impexpd: Disable OpenSSL compression in socat if possible · 29e8788e

Michael Hanselmann authored 14 years ago


This uses an option only available in patched socat versions. More
information is available from the INSTALL update included in this
patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

29e8788e

Merge branch 'stable-2.3' into devel-2.3 · cd22574b

Michael Hanselmann authored 14 years ago


* stable-2.3:
  Bump version for 2.3.0

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cd22574b

Bump version for 2.3.0 · 7c324b88

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

7c324b88

Nov 30, 2010

Merge branch 'devel-2.2' into devel-2.3 · 5d9f9cba

Michael Hanselmann authored 14 years ago


* devel-2.2:
  Correct version check for release candidates
  Fix version check
  Add script to check version format

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5d9f9cba

Correct version check for release candidates · cdb303ab

Michael Hanselmann authored 14 years ago


The tilde needs to be escaped and I forgot the space which should be
used instead.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

cdb303ab

config.py: need explicit %-formatting in errors.OpPrereqError. · c49b0092
Adeodato Simo authored 14 years ago
```
Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
```
c49b0092

Nov 25, 2010

Fix version check · 35576615

Michael Hanselmann authored 14 years ago


Don't ask … all I say is distcheck.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

35576615

Nov 24, 2010

Add script to check version format · 96602be4

Michael Hanselmann authored 14 years ago


Only versions of the format “x.y.z” and “x.y.z~(rc|beta)N” (for N>0) are
allowed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

96602be4

Merge branch 'devel-2.2' into devel-2.3 · b6ac86e0

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b6ac86e0

Fix coverage reports · 577b170b

Iustin Pop authored 14 years ago


Currently, the coverage reports include the unittests themselves, and
this skewes unfairly the reports, as the coverage for the tests is very
high (since they all run).

To fix this, we export the ganeti temp dir from run-in-temp-dir, and we
use that to exclude the tests directory. The patch also fixes a but
related to multiple directories to be omitted (--omit a --omit b is
wrong, it needs to be --omit a,b).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

577b170b

Nov 19, 2010

Updates NEWS and configure.ac for 2.3.0~rc1 · ca6c2dcd

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ca6c2dcd

Merge branch 'devel-2.2' into devel-2.3 · 2b613de4

Iustin Pop authored 14 years ago


* devel-2.2:
  Update NEWS & configure.ac for the 2.2.2 release
  Fix documentation regarding conversion to drbd

Conflicts:
	NEWS         (integrated 2.2 changes)
	configure.ac (kept our version)

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2b613de4

Update NEWS & configure.ac for the 2.2.2 release · 2596526d

Iustin Pop authored 14 years ago


This imports the 2.1.8 NEWS entry and adds the 2.2.2 one, then updates the
configure.ac version.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

2596526d

Fix documentation regarding conversion to drbd · a22eb33b

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a22eb33b

Fix documentation regarding conversion to drbd · 3e039592

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

3e039592

Nov 18, 2010

Reinstall instance: disallow offline secondaries · 9aacb199

Iustin Pop authored 14 years ago


Currently, reinstallation of a DRBD instance with the secondary node offline does:

node1# gnt-instance reinstall -f instance1
Waiting for job 139053 for instance1...
Thu Nov 18 01:36:09 2010  - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline
Thu Nov 18 01:36:09 2010  - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline
Job 139053 for instance1 has failed: Failure: command execution error:
Disk consistency error

Since this fails anyway, let's check the secondary nodes, thus
preventing any modifications to the instance (e.g. OS type change):

node1# gnt-instance reinstall -f instance1
Waiting for job 139058 for instance1...
Job 139058 for instance1 has failed: Failure: prerequisites not met for this operation:
error type: wrong_state, error details:
Instance secondary node offline, cannot reinstall: node3

The patch needs modifications to the _CheckNodeOnline function, in order
to display meaningful messages ("Can't use offline node" would be very
confusing for an instance reinstall, since we didn't select a node
manually).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9aacb199

QA: check that doubly modifying an OS state is OK · 89e8af70

Iustin Pop authored 14 years ago


This would have prevented the bug fixed in the previous patch :(

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

89e8af70

Fix breakage in OS state modify · e2334900

Iustin Pop authored 14 years ago


I was using the feedback_fn function incorrectly (it doesn't
automatically expand the arguments).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

e2334900