Commits · e366273a9e4c61957a5f51278c737f29064ae69b · itminedu / snf-ganeti

Oct 18, 2011

Revert "rapi.client.ModifyNode should PUT rather than POST" · e366273a

Guido Trotter authored 13 years ago


This was a mistake on my side because ModifyGroup and ModifyInstance
were PUT, and I was not aware of the discussion and the rationale why
this one had to be POST.

This reverts commit 55ef0cf6.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e366273a

Revert "Added SPICE TLS option and related cert paths" · 9849cec7

Guido Trotter authored 13 years ago


This reverts commit bfe86c76.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9849cec7

Revert "Implementation of TLS-protected SPICE connections" · 0aee8ee9

Guido Trotter authored 13 years ago


This reverts commit b6267745.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

0aee8ee9

Revert "Updated man pages with new SPICE TLS options" · 1027b01f

Guido Trotter authored 13 years ago


This reverts commit b8a10435.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

1027b01f

Revert "Add tls_ciphers and use_vdagent options" · 53328375

Guido Trotter authored 13 years ago


This reverts commit 3e40b587.
This commit will be readded on master.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

53328375

rapi.client.ModifyNode should PUT rather than POST · 55ef0cf6

Guido Trotter authored 13 years ago


This was caught (albeit in a sibylline manner) by unittests on master
which are not present in 2.5.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

55ef0cf6

Fix RAPI node modify client and server calls · 370f2042

Guido Trotter authored 13 years ago


rapi.client.ModifyNode accepts a "group" and not a "node" param.
(this bug is invisible but still not nice)

rlib2.R_2_nodes_name_modify submits the opcode with instance_name rather
than node_name as a param. This would break the call.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

370f2042

Oct 17, 2011

xen: changes to facilitate "xl" support (xen 4.1) · 6555373d

Guido Trotter authored 13 years ago


- Copy the xl config file, in case there's any
- Start instances by config file, not name (also xm compatible)
- Start paused domains with -p and not --paused (also xm compatible)
- Add a fixme for migration (changes are not xm compatible)

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6555373d

xen: abstract instance config file naming · c2be2532

Guido Trotter authored 13 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c2be2532

Abstract xen's 'xm' command as a constant · 2876c2d6

Guido Trotter authored 13 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2876c2d6

Oct 13, 2011

Fix RAPI documentation build · c22d3bce

Michael Hanselmann authored 13 years ago


*mumble*

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c22d3bce

rapi: Allow auto-promotion on node role change · 8de8e68d

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8de8e68d

rapi: Add resource for modifying node · 94497dd1

Michael Hanselmann authored 13 years ago


A separate patch will add “auto-promote” through
“/2/nodes/[node_name]/role”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

94497dd1

opcodes: Add comment to *SetParams result description · b3d2ee31

Michael Hanselmann authored 13 years ago


Explicitely say that the second element of the tuple is the new value.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b3d2ee31

Oct 12, 2011

Merge branch 'stable-2.5' into devel-2.5 · 5b0ac1a5

Michael Hanselmann authored 13 years ago


* stable-2.5:
  rpc: Disable HTTP client pool and reduce memory consumption
  hail: Fix result for node evacuation
  Fix assertion error on unclean master shutdown

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

5b0ac1a5

Merge branch 'devel-2.4' into stable-2.5 · 58f6738c

Michael Hanselmann authored 13 years ago


* devel-2.4:
  rpc: Disable HTTP client pool and reduce memory consumption
  Fix assertion error on unclean master shutdown

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

58f6738c

rpc: Disable HTTP client pool and reduce memory consumption · 05927995

Michael Hanselmann authored 13 years ago

We noticed that “ganeti-masterd” can use large amounts of memory,
especially on large clusters. Measurements showed a single PycURL client
using about 500 kB of heap memory (the actual usage depends on versions,
build options and settings).

The RPC client uses a per-thread HTTP client pool with one client per
node. At this time there are 41 non-main threads (25 for the job queue
and 16 for client requests). This means the HTTP client pools use a lot
of memory (ca. 200 MB for 10 nodes, ca. 1 GB for 50 nodes).

This patch disables the per-thread HTTP client pool. No cleanup of
unused code is done. That will be done in the master branch only.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

05927995

Oct 11, 2011

Preserve bridge MTU in KVM ifup script · a1ec8695

Andrea Spadaccini authored 13 years ago


Closes: #201 - KVM_IFUP does not set bridge-MTU on tap devices
Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a1ec8695

Oct 07, 2011

hail: Fix result for node evacuation · 1ab94e48

Michael Hanselmann authored 13 years ago


According to the iallocator documentation the “node-evacuate” call needs
to return a list of jobs, not a list of lists of jobs.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

1ab94e48

Oct 04, 2011

Merge branch 'stable-2.5' into devel-2.5 · a080bab8

Andrea Spadaccini authored 13 years ago


Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a080bab8

cluster-merge: log an info message at node readd · 419bb2ef

Guido Trotter authored 13 years ago


node readd can take a long time, it's good to have info messages to see
progress.

Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

419bb2ef

Bump version to 2.5.0~rc1 · 07cea902

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

07cea902

Fix issue when verifying cluster files · 170b02b7

Michael Hanselmann authored 13 years ago


If a cluster has any non-master-candidate nodes, those don't contain all
files (e.g. config.data). With commit aef59ae7 (March 31st, 2011)
the logic was changed and subsequently verifying a cluster with non-mc
nodes would complain.

This patch fixes this issue by changing the algorithm. It also adds an
additional check for files which shouldn't exist on a machine. A newly
added unittest is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

170b02b7

Oct 03, 2011

Revert "utils.log: Write error messages to stderr" · d728ac75

Michael Hanselmann authored 13 years ago


This reverts commit 34aa8b7c. Writing
error messages to stderr would also include backtraces, something we
tried to avoid in the past.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d728ac75

Fix adding nodes after commit · ca6b16e5

Michael Hanselmann authored 13 years ago


Commit 64c7b383 changed the RPC call for verifying SSH connections.
Unfortunately this case in adding nodes was missed.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ca6b16e5

Sep 30, 2011

LUClusterVerifyGroup: Spread SSH checks over more nodes · 64c7b383

Michael Hanselmann authored 13 years ago


When verifying a group the code would always check SSH to all nodes in
the same group, as well as the first node for every other group. On big
clusters this can cause issues since many nodes will try to connect to
the first node of another group at the same time. This patch changes the
algorithm to choose a different node every time.

A unittest for the selection algorithm is included.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

64c7b383

Optimise cli.JobExecutor with many pending jobs · 11705e3d

Iustin Pop authored 13 years ago


In the case we submit many pending jobs (> 100) to the masterd, the
JobExecutor 'spams' the master daemon with status requests for the
status of all the jobs, even though in the end it will only choose a
single job for polling.

This is very sub-optimal, because when the master is busy processing
small/fast jobs, this query forces reading all the jobs from
this. Restricting the 'window' of jobs that we query from the entire
set to a smaller subset makes a huge difference (masterd only, 0s
delay jobs, all jobs to tmpfs thus no I/O involved):

- submitting/waiting for 500 jobs:
  - before: ~21 s
  - after:   ~5 s
- submitting/waiting for 1K jobs:
  - before: ~76 s
  - after:   ~8 s

This is with a batch of 25 jobs. With a batch of 50 jobs, it goes from
8s to 12s. I think that choosing the 'best' job for nice output only
matters with a small number of jobs, and that for more than that
people will not actually watch the jobs. So changing from 'perfect
job' to 'best job in the first 25' should be OK.

Note that most jobs won't execute as fast as 0 delay, but this is
still a good improvement.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

11705e3d

Merge branch 'stable-2.5' into devel-2.5 · cea3abbd

Andrea Spadaccini authored 13 years ago


* stable-2.5:
  listrunner: Don't pass arguments if there are none
  ssh: Quote strings in error message
  utils.log: Write error messages to stderr
  Add signal handling doc to hbal man page
  Fix handling of cluster verify hooks
  Redistribute the RAPI certificate
  QA: Add tests for instance start/stop via RAPI
  RAPI: Fix wrong check on instance shutdown
  baserlib: Accept empty body in FillOpcode

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

cea3abbd

Use --yes to deactivate master ip in cluster merge · aeb24d97

Guido Trotter authored 13 years ago


Signed-off-by: Guido Trotter <ultrotter@google.com>
Reviewed-by: Andrea Spadaccini <spadaccio@google.com>

aeb24d97

Use deactivate-master-ip in cluster-merge · a3fad332

Andrea Spadaccini authored 13 years ago


Use the gnt-cluster deactivate-master-ip command in cluster-merge to
disable the master IP.

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit e87e5afb)

a3fad332

Add gnt-cluster commands to toggle the master IP · fb44c6db

Andrea Spadaccini authored 13 years ago


lib/client/gnt_cluster.py:
* Add activate-master-ip and deactivate-master-ip commands

man/gnt-cluster.rst:
* Document the new commands

lib/opcodes.py lib/cmdlib.py
* Add two opcodes and the LU that call the relevant RPCs

test/docs_unittest.py
* Silence an error about RAPI not implemented for the two new opcodes

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb926117)

Conflicts:

	test/docs_unittest.py
	  - kept devel-2.5 version, without the RAPI opcode checks

fb44c6db

Split starting and stopping master IP and daemons · c06e0c83

Andrea Spadaccini authored 13 years ago


lib/backend.py
* split StartMaster() in ActivateMasterIp() and StartMasterDaemons()
* split StopMaster() in DeactivateMasterIp() and StopMasterDaemons()

lib/server/noded.py, lib/rpc.py
* adapt the call chains to the new functions, define new RPCs

lib/bootstrap.py, lib/cmdlib.py, lib/server/masterd.py
* use the new RPCs

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>
(cherry picked from commit fb460cf7)

c06e0c83

listrunner: Don't pass arguments if there are none · 0c009cc5

Michael Hanselmann authored 13 years ago


If no arguments were specified the “exec_args” variable was “None”,
leading to the command being run as “… ./… None”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

0c009cc5

ssh: Quote strings in error message · 9dc45ab1

Michael Hanselmann authored 13 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

9dc45ab1

utils.log: Write error messages to stderr · 34aa8b7c

Michael Hanselmann authored 13 years ago


When “gnt-cluster copyfile” failed it would only print “Copy of file …
to node … failed”. A detailed message is written using logging.error.
Writing error messages to stderr can be helpful in figuring out what
went wrong (the messages also go to the log file, but not everyone might
know about it).

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

34aa8b7c

Add signal handling doc to hbal man page · 2b634302

Iustin Pop authored 13 years ago


Also remove a bug note, since hbal can now for a long time directly
execute jobs.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

2b634302

Sep 28, 2011

Migration: warn the user about hv version mismatch · 34fbc862

Andrea Spadaccini authored 13 years ago


* hv_kvm.py, hv_xen.py
  - return the hypervisor version (if available) from GetNodeInfo

* cmdlib.py
  - if hypervisor version is available during the migration, and the
    versions differ, warn the user

Signed-off-by: Andrea Spadaccini <spadaccio@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

34fbc862

Fix handling of cluster verify hooks · 3656c889

Iustin Pop authored 13 years ago


The change to enforce boolean results for cluster verify group opcode
missed the HooksCallBack, which uses a very ugly 1/0
logic. Furthermore, the logic is wrong, since it unconditionally
resets the verify result to true.

The patch is changed to simply treat hook failures as failures, and do
nothing for offline/nodes.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3656c889

Redistribute the RAPI certificate · 835f8b23

Iustin Pop authored 13 years ago


This reverts to the old behaviour in Ganeti 2.4 and before.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

835f8b23

Sep 22, 2011

QA: Add tests for instance start/stop via RAPI · a7418448

Michael Hanselmann authored 13 years ago


This would have detected the issue fixed in the previous patch.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

a7418448