Commits · c4f10abb471a4c993dd64d527899320f6812d29f · itminedu / snf-ganeti

Dec 24, 2010

LUInstanceRename: log result of name resolving · c4f10abb

Iustin Pop authored 14 years ago


While the LU does return the final name, it's useful to log the actual
DNS resolving process (input and output) in order to help with the
diagnose of failures.

The patch also fixes the docstring of the Exec() function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

c4f10abb

Dec 21, 2010

Fix QA for “list-fields” commands · c694367b

Michael Hanselmann authored 14 years ago


The list of fields is not only sorted, but sorted in a nice way.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c694367b

Remove utils.FormatTimestampWithTZ · 0f9294f7

Michael Hanselmann authored 14 years ago

Long story short: time.strftime("%Z", time.localtime()) doesn't work,
even though it's documented to be equivalent to time.strftime("%Z").

$ TZ=America/Sao_Paulo python -c 'import time; print
time.strftime("%Z"), time.strftime("%Z", time.localtime())'
BRST LMT

References:
http://bugs.python.org/issue762963
https://bugs.launchpad.net/ubuntu/+source/python2.6/+bug/564607
http://stackoverflow.com/questions/4367896/issue-with-timezone-with-time-strftime



Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

0f9294f7

Ensure temp files from RunCmd tests are removed · 9e691184

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9e691184

Allow customisation of the disk index separator · 3536c792

Iustin Pop authored 14 years ago


As per issue 124, some Xen versions (or packaging) don't deal nicely
with the colon being part of a disk name. Therefore we add a
configure-time option for customising this.

Note: setting the separator to interesting values like / is not
handled by the code. This being a configure-time option (e.g. to be
set by distribution packagers), we assume the person building the code
knows what they are doing.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3536c792

utils: Timezone fixes and tests · 9f37f689

Michael Hanselmann authored 14 years ago


- Update docstrings to explicitely mention Epoch
- Fix timezone bug in FormatTimestampWithTZ, where it would
  use GMT/UTC when it should use the local timezone
- Add unittests for time formatting functions

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9f37f689

query: Add wrapper for creating response object · b60fcb6f

Michael Hanselmann authored 14 years ago


It'll be used for querying locks.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b60fcb6f

Move QueryFields to query module · aa29e95f

Michael Hanselmann authored 14 years ago


Also replace “sorted” with “utils.NiceSort” now that it supports a key
function.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

aa29e95f

Dec 20, 2010

Merge branch 'devel-2.3' · 82d25bbb

Michael Hanselmann authored 14 years ago


* devel-2.3:
  Prepare 2.3.1 release
  Fix disk status verification in LUClusterVerify

Conflicts:
	NEWS: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

82d25bbb

Merge branch 'stable-2.3' into devel-2.3 · 43217ac7

Michael Hanselmann authored 14 years ago


* stable-2.3:
  Prepare 2.3.1 release
  Fix disk status verification in LUClusterVerify

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

43217ac7

Add QA scripts to checked Python code · a8c68e44

Michael Hanselmann authored 14 years ago


pylint is not yet included as the code needs some work for that.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a8c68e44

ganeti-qa: Wrap lines longer than 80 chars · 930e77d1

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

930e77d1

Prepare 2.3.1 release · bb2dc35a

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

bb2dc35a

Dec 17, 2010

Adapt QA for change in behaviour · abb24834

René Nussbaumer authored 14 years ago


As we can't test this on master anymore (if we flag the node offline we
would change master role on master) we use the first non master node we
find in the configuration

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

abb24834

gnt-node modify: Adding --node-powered=yes|no · dd94e9f6

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

dd94e9f6

LUSetNodeParams: Add support for powered state · 0ecef64c

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0ecef64c

LUSetNodeParams/LUOobCommand respect offline/powered · 78758f1e

René Nussbaumer authored 14 years ago


This patch makes sure we cross verify the state the node is
in with our view:

power off -> Node has to be set offline
modify -O no -> Node has to be powered

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

78758f1e

gnt-node power: Mark also offline when powering off · d363797e

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d363797e

Merge branch 'devel-2.3' · 1ef6e776

Michael Hanselmann authored 14 years ago


* devel-2.3:
  QA: Run cluster-verify as part of all instance tests
  QA: Fix typo and add “not”
  ensure-dirs: Speed up when using big queues
  Fix gnt-cluster verify with diskless instances

Conflicts:
	lib/cmdlib.py: Trivial
	qa/ganeti-qa.py: Trivial

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

1ef6e776

utils.NiceSort: Use sorted(), add keyfunc, unittests · 153533f3

Michael Hanselmann authored 14 years ago


This patch changes utils.NiceSort to use the built-in “sorted()” and
gets rid of the intermediate list. Instead of wrapping the items
ourselves, a key function is used. The caller can specify another key
function (useful to sort objects by their name, e.g.
“utils.NiceSort(instances, key=operator.attrgetter("name"))”.

Unittests are provided.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

153533f3

QA: Run cluster-verify as part of all instance tests · d27150a9

Michael Hanselmann authored 14 years ago


“gnt-cluster verify” looks at some per-instance information as well, so
it should be run for each instance type QA tests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d27150a9

QA: Fix typo and add “not” · 65924a12

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

65924a12

ShutdownInstanceDisks: accept offline secondaries · 7c2e922e

Iustin Pop authored 14 years ago


For secondary node that is offline, we should not consider that the
disk shutdown has failed, as it can never succeed under this cluster
state and (by virtue of the fact that the secondary node is offline)
the disks are already "shutdown".

The patch also fixes a tiny typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7c2e922e

RpcResult: simplify some asserts · 2c0f74f2

Iustin Pop authored 14 years ago


data ≫ code, eom.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

2c0f74f2

Dec 16, 2010

ensure-dirs: Speed up when using big queues · 196d70fa

Michael Hanselmann authored 14 years ago


The “ensure-dirs” script as included in Ganeti 2.3 is very slow when
working with big queues requiring a change of permissions on many or all
files.

$ find /var/lib/ganeti/queue/ | wc -l
52354

Before this change:
$ time /usr/local/lib/ganeti/ensure-dirs -f
real    16m4.739s

While not adressed in this patch, I'd like to record the overall
ineffiency of the “ensure-dirs” script, even after this change:

$ time /usr/local/lib/ganeti/ensure-dirs -f
real    5m57.362s
[…]
$ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.08    5.147090          49    104774           clone
 49.92    5.131094          49    104739           execve

More changes will be needed. Just for comparision, a small Python
snippet changing permissions on all files (“ensure-dirs” changes the
owner too):

$ time python -c 'import os; from ganeti import utils;
[os.chmod(i, 0644) for i in
utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]'
real    0m0.605s
[…]

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

196d70fa

LUAddNode: default ndparams to empty dict when not provided · 4c3ac53a
René Nussbaumer authored 14 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
```
4c3ac53a

QA: Add some basic OOB tests · a1de4b18

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a1de4b18

QA: Allow upload of string data · b9955569

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b9955569

Dec 15, 2010

Fix gnt-cluster verify with diskless instances · 4f5c2533

Adeodato Simo authored 14 years ago


`gnt-cluster verify` was failing with KeyError if there was any
diskless instance in the cluster. This was because _CollectDiskInfo()
was not including these instances in the returned dictionary, but they
were expected to be present in LUVerifyCluster.Exec().

With this commit, we ensure that the dictionary returned by _CollectDiskInfo
includes entries for diskless instances as well.

Signed-off-by: Adeodato Simo <dato@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4f5c2533

Fix N+1 error message · 861a296e

Miguel Di Ciurcio Filho authored 14 years ago


The error contained a typo and is slightly cumbersome. It changes from:

- ERROR: node a: not enough memory on to accommodate failovers should peer node
  b fail

to:

- ERROR: node a: not enough memory to accomodate instance failovers should node
  b fail

Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

861a296e

Rename (Op|LU)OutOfBand to (Op|LU)OobCommand · 792af3ad
René Nussbaumer authored 14 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
792af3ad

Merge branch 'devel-2.3' · 50d54091

Michael Hanselmann authored 14 years ago


* devel-2.3:
  jqueue: Keep jobs in “waitlock” while returning to queue
  Improve jqueue unittests
  Update manpages to display version 2.3

Conflicts:
	man/ganeti-cleaner.sgml: Removed
	man/ganeti-confd.sgml: Removed
	man/ganeti-masterd.sgml: Removed
	man/ganeti-noded.sgml: Removed
	man/ganeti-os-interface.sgml: Removed
	man/ganeti-rapi.sgml: Removed
	man/ganeti-watcher.sgml: Removed
	man/ganeti.sgml: Removed
	man/gnt-backup.sgml: Removed
	man/gnt-cluster.sgml: Removed
	man/gnt-debug.sgml: Removed
	man/gnt-instance.sgml: Removed
	man/gnt-job.sgml: Removed
	man/gnt-node.sgml: Removed
	man/gnt-os.sgml: Removed

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

50d54091

jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694

Michael Hanselmann authored 14 years ago


Iustin Pop reported that a job's file is updated many times while it
waits for locks held by other thread(s). After an investigation it was
concluded that the reason was a design decision for job priorities to
return jobs to the “queued” status if they couldn't acquire all locks.
Changing a jobs' status or priority requires an update to permanent
storage.

In a high-level view this is what happens:
1. Mark as waitlock
2. Write to disk as permanent storage (jobs left in this state by a
   crashing master daemon are resumed on restart)
3. Wait for lock (assume lock is held by another thread)
4. Mark as queued
5. Write to disk again
6. Return to workerpool

Another option originally discussed was to leave the job in the
“waitlock” status. Ignoring priority changes, this is what would happen:
1. If not in waitlock
1.1. Assert state == queued
1.2. Mark as waitlock
1.3. Set start_timestamp
1.4. Write to disk as permanent storage
3. Wait for locks (assume lock is held by another thread)
4. Leave in waitlock
5. Return to workerpool

Now let's assume the lock is released by the other thread:
[…]
3. Wait for locks and get them
4. Assert state == waitlock
5. Set state to running
6. Set exec_timestamp
7. Write to disk

As this change reduces the number of writes from two per lock acquire
attempt to two per opcode and one per priority increase (as happens
after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
the highest priority is reached), here's the patch to implement it.
Unittests are updated.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5fd6b694

Improve jqueue unittests · ebb2a2a3

Michael Hanselmann authored 14 years ago


- Verify job file updates
- Ensure queue lock is released while executing opcode

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ebb2a2a3

Adding gnt-node power * commands · abefdcff

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

abefdcff

Do the expanding of the node name in ExpandNames · 1b386c69
René Nussbaumer authored 14 years ago
```
Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
1b386c69

Dec 14, 2010

client.gnt_node: Remove unnecessary lambda · a36f605d

Michael Hanselmann authored 14 years ago


Pylint complained that the “lambda may not be necessary”. Turns out it
was right.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a36f605d

QA: Extend unittests for query operations, add tests for list-fields · 2214cf14
Michael Hanselmann authored 14 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>
```
2214cf14

Update NEWS for new query infrastructure · d7a08491

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

d7a08491

Convert “gnt-instance list” to query2 · b82c5ff5

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

b82c5ff5