Commits · e1876432f740aa4937efc64fa1aa496b1bc341d3 · itminedu / snf-ganeti

Jul 24, 2009

Get rid of constants.RAPI_ENABLE · e1876432

Guido Trotter authored 15 years ago


This constant is unused, except in qa. Removing it since it's always True.

This patch also removes the unused qa_rapi.PrintRemoteAPIWarning
function, and removes a comment about temporary constants "until we have
cluster parameters".

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e1876432

Jul 23, 2009

Remove references to utils.debug · 68b1fcd5

Guido Trotter authored 15 years ago


Various modules set it to True when called in debugging mode, but the
utils module supports no such global.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

68b1fcd5

ganeti-rapi, replace hardcoded exit value · be73fc79

Guido Trotter authored 15 years ago


substitute exit(1) with exit(constants.EXIT_FAILURE).
Also fix a wrongly indented line.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

be73fc79

Add the bind-address option to ganeti-rapi · 8790ac54

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8790ac54

Jul 22, 2009

noded: Abstract hard-coded sys.exit value · 46479775

Guido Trotter authored 15 years ago


On machines without the ssl file noded exists '5'.
Changing this to constants.EXIT_NOTCLUSTER.

Also utils.GetNodeDaemonPort hasn't risen errors.ConfigurationError for
a while, so removing that try/except block.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

46479775

Add an example "ethers" hook · 93db3d8f

Guido Trotter authored 15 years ago


This hook can be used to update /etc/ethers with instance's mac
addresses. A dhcp server on the nodes can then serve to the instances
their correct address. (This has been tested with dnsmasq's dhcp
implementation)

Signed-off-by: Guido Trotter <ultrotter@google.com>

93db3d8f

Jul 21, 2009

burnin: move batch init/commit into a decorator · c70481ab

Iustin Pop authored 15 years ago


Many burnin steps initialize the batch queue at the beginning and commit
it at the end of their operation. This patch moves this code to a
decorator, in order to reduce redundant code.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

c70481ab

burnin: move instance alive checks to a decorator · d9b7a0b4

Iustin Pop authored 15 years ago


Many burn steps to a manual check of instance aliveness, via duplicate
code. This patch moves this code to a decorator.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

d9b7a0b4

burnin: Implement retryable operations · 73ff3118

Iustin Pop authored 15 years ago


Some burnin steps are idempotent: e.g. reinstalling an instance (from
burning p.o.v.) can be done multiple times without any side-effects that
would affect later burnin steps. As such, failing the whole burnin
process due a reinstall failure is undesirable.

This patch modifies burnin by marking each opcode (in case of individual
execution) and job set retryable or not. Retryable actions will be
retried up to a number of times, after which we give up and return
failure.

One side-effect is that in case of full-failure in retryable job sets we
lose the original exception (but we do log its string format), so we
have a little bit less information in this case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

73ff3118

Jul 20, 2009

Ignore vim swap files · 699d856f

Michael Hanselmann authored 15 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

699d856f

Jul 19, 2009

burnin: fix removal errors hiding real errors · 8629a543

Iustin Pop authored 15 years ago


A long-standing bug in burnin makes errors during the removal phase
(e.g. because an import has failed, or because the initial creation has
failed) hide the original error.

This patch suppresses removal errors if we are already in ‘has_err’
mode, and otherwise it displays them normally.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

8629a543

backend: Only build once the list of upload files · 360b0dc2

Iustin Pop authored 15 years ago


The list of upload files is built currently at every UploadFile() call.
This patch moves it to a separate variable which is initialized only
once.

This won't make much difference but I regard it as cleanup.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

360b0dc2

Fix gnt-instance reinstall · b8f31860

Iustin Pop authored 15 years ago


Commit 55efe6da "Convert instance
reinstall to multi instance model" actually broke instance reinstall for
single-instance cases. This one-liner fixes it.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit b6e243ab)

b8f31860

Fix a couple of epydoc warnings · 6af6270a

Iustin Pop authored 15 years ago


It seems epydoc needs fully-qualified references, and doesn't deal with
relative ones (not even in the current module) if there are any
ambiguities.

There are other epydoc warnings, in the rapi docstrings, but those are
left as-is as they're removed in 2.1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

6af6270a

job queue: fix loss of finalized opcode result · 34327f51

Iustin Pop authored 15 years ago


Currently, unclean master daemon shutdown overwrites all of a job's
opcode status and result with error/None. This is incorrect, since the
any already finished opcode(s) should have their status and result
preserved, and only not-yet-processed opcodes should be marked as
‘error’. Cancelling jobs between opcodes does the same (but this is not
allowed currently by the code, so it's not as important as unclean
shutdown).

This patch adds a new _QueuedJob function that only overwrites the
status and result of finalized opcodes, which is then used in job queue
init and in the cancel job functions. The patch also adds some comments
and a new set constants in constants.py highlighting the finalized vs.
non-finalized opcode statuses.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

34327f51

Switch gnt-debug submit-job to JobExecutor · b59252fe

Iustin Pop authored 15 years ago


Currently gnt-debug submits jobs individually, but in 2.1 JobExecutor
uses the optimized SubmitManyJobs luxi call and as such should be used
whenever multiple jobs need to be submitted.

This patch converts gnt-debug submit-job to use it and also removes an
extra empty line in the JobExecutor class.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

b59252fe

Convert instance reinstall to multi instance model · 3d2ca95d

Iustin Pop authored 15 years ago


This patch converts ‘gnt-instance reinstall’ from single-instance to
multi-instance model; since this is dangerours, it's required to pass
“--force --force-multiple” to skip the confirmation.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 55efe6da)

3d2ca95d

gnt-instance batch-create: use the job executor · dd7dcca7

Iustin Pop authored 15 years ago


This small patch changed the batch create functionality to use the job
executor instead of single-job submits.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit d4dd4b74)

dd7dcca7

Modify cli.JobExecutor to use SubmitManyJobs · f2921752

Iustin Pop authored 15 years ago


This patch changes the generic "multiple job executor" to use the many
jobs submit model, which automatically makes all its users use the new
model.

This makes, for example, startup/shutdown of a full cluster much more
logical (all the submitted job IDs are visible fast, and then waiting
for them proceeds normally).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 23b4b983)

f2921752

Add a luxi call for multi-job submit · 56d8ff91

Iustin Pop authored 15 years ago


As a workaround for the job submit timeouts that we have, this patch
adds a new luxi call for multi-job submit; the advantage is that all the
jobs are added in the queue and only after the workers can start
processing them.

This is definitely faster than per-job submit, where the submission of
new jobs competes with the workers processing jobs.

On a pure no-op OpDelay opcode (not on master, not on nodes), we have:
  - 100 jobs:
    - individual: submit time ~21s, processing time ~21s
    - multiple:   submit time 7-9s, processing time ~22s
  - 250 jobs:
    - individual: submit time ~56s, processing time ~57s
                  run 2:      ~54s                  ~55s
    - multiple:   submit time ~20s, processing time ~51s
                  run 2:      ~17s                  ~52s

which shows that we indeed gain on the client side, and maybe even on
the total processing time for a high number of jobs. For just 10 or so I
expect the difference to be just noise.

This will probably require increasing the timeout a little when
submitting too many jobs - 250 jobs at ~20 seconds is close to the
current rw timeout of 60s.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit 2971c913)

56d8ff91

job queue: fix interrupted job processing · f6424741

Iustin Pop authored 15 years ago


If a job with more than one opcodes is being processed, and the master
daemon crashes between two opcodes, we have the first N opcodes marked
successful, and the rest marked as queued. This means that the overall
jbo status is queued, and thus on master daemon restart it will be
resent for completion.

However, the RunTask() function in jqueue.py doesn't deal with
partially-completed jobs. This patch makes it simply skip such opcodes.

An alternative option would be to not mark partially-completed jobs as
QUEUED but instead RUNNING, which would result in aborting of the job at
restart time.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

f6424741

Fix an error path in job queue worker's RunTask · ed21712b

Iustin Pop authored 15 years ago


In case the job fails, we try to set the job's run_op_idx to -1.
However, this is a wrong variable, which wasn't detected until the
__slots__ addition. The correct variable is run_op_index.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

ed21712b

Jul 17, 2009

Add __slots__ on objects in jqueue · 66d895a8

Iustin Pop authored 15 years ago


Adding slots to _QueuedOpCode decreases memory usage (of these objects)
by roughly four times. It is a lesser change for _QueuedJobs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

66d895a8

ganeti.initd: Pass $*_ARGS to programs when restarting them · 7f5e61b4
Michael Hanselmann authored 15 years ago
```
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
```
7f5e61b4

Optimizie OpCode loading · 363acb1e

Iustin Pop authored 15 years ago


This patch converts the opcode loading to a pre-built map (at import
time) instead of iteration over the globals dict at each call.

Microbenchmarks show that this should be around three times faster, and
burnin still passes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

363acb1e

Yet another fallout from the pylint fixes · b0c63e2b

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

b0c63e2b

Merge branch 'master' into next · 2a061e15

Guido Trotter authored 15 years ago

* master:
  Update NEWS and version for 2.0.2 release
  Improve the description of node flags in man page
  Change default stripe count to 1
  Use full-stripe size in LVM growth
  RAPI: implement instance reinstall

2a061e15

Fix another issue with hypervisor_name change · 3df6e710

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3df6e710

Update NEWS and version for 2.0.2 release · 550a995a

Iustin Pop authored 15 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

550a995a

Jul 16, 2009

Improve the description of node flags in man page · 253ba78f

Raiford Storey authored 15 years ago


[iustin@google.com: slightly reworded the explanation for offline and
changed the commit message]
Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

253ba78f

Add enabled hypervisors to TestConfigRunner · 529d13a4

Guido Trotter authored 15 years ago


This parameter is now mandatory for the cluster config to work.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

529d13a4

Add a few more checks to verify config · 9a5fba23

Guido Trotter authored 15 years ago


- Check that the enabled hypervisors list is valid
- Check that the master node is a valid node

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9a5fba23

Make sure enabled_hypervisors list is valid · b119bccb

Guido Trotter authored 15 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b119bccb

Change default stripe count to 1 · 7b3ac94d

Iustin Pop authored 15 years ago


In order not to change the default during a stable series, we modify
configure.ac to default to one stripe, in effect keeping the status quo
(well, minus the LVM Attach() changes).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

7b3ac94d

Use full-stripe size in LVM growth · 38256320

Iustin Pop authored 15 years ago


LVM has issues when growing stripped volumes, so it's best to specify
the growth in exact multiples of the full stripe size (as precise as
possible). For this we need to do a couple of changes:
  - in LVM Attach(), we query additionally the VG extent size and the LV
    stripe count; since this makes lvs return a (possibly) multi-line
    output, we now split it into lines and only take the last one
  - in LVM Grow(), we round up the increase in multiples of the full
    stripe size

The patch also sets the correct target size in DRBD growth.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Olivier Tharan <olive@google.com>

38256320

Jul 14, 2009

Remove ConfigWriter.InitConfig · e52019f7

Guido Trotter authored 15 years ago


It's been replaced by a simpler bootstrap.InitConfig function, which
does the same job, and is currently unused.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e52019f7

Remove SimpleConfigWriter.SetMasterNode · 48c8887b

Guido Trotter authored 15 years ago


This function is not used.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

48c8887b

_GenerateDiskTemplate: use base_index in the name · fb4b324b

Guido Trotter authored 15 years ago


Currently if a disk is added later the base_index is not considered, and
all the disks are called disk0. This patch fixes it.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

fb4b324b

ganeti-masterd: avoid SimpleConfigReader · b2890442

Guido Trotter authored 15 years ago

SimpleStore is a lot less heavyweight than SimpleConfigReader, and to
just get the master name we can use that. This is the only usage of
SimpleConfigReader currently, but we're not going to delete the class,
as new usages will come in for ganeti-confd (in 2.1). Using it there,
though, will make the class even more heavy to load, so it makes sense
for this simple usage to be converted.

Signed-off-by: Guido Trotter <ultrotter@google.com>

b2890442

Jul 13, 2009

cmdlib: Fix typo in LUQueryClusterInfo · b8810fec

Michael Hanselmann authored 15 years ago


This was broken by my pylint fixes patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b8810fec