Commits · f936c1538e472f38ecd479c0c7e785644ba807be · itminedu / snf-ganeti

Oct 26, 2010

Rename node.nodegroup to node.group · f936c153

Iustin Pop authored 14 years ago


In the context of a node, its group has (at least today) only one
meaning, that is the node's node group. As such, we rename
node.nodegroup to just node.group.

Note: if we want to keep node in there, it should be at least
node_group, for consistency with the other node attributes.

Similarly, we rename the OpAddNode nodegroup attribute to group.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f936c153

Rename --nodegroup to --node-group · a285fcfd

Iustin Pop authored 14 years ago


For consistency with other CLI options.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

a285fcfd

Export node group data in iallocator · 622444e5

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

622444e5

Split IAllocator._ComputeClusterData · acd34ea7

Iustin Pop authored 14 years ago


The node and instance computations were all in this big function; we
separate them out for more clarity.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

acd34ea7

Putting the pieces together and invoke the wipe in cmdlib · a03fcb26

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

a03fcb26

Adding RPC call for blockdev_wipe · 271b7cf9

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

271b7cf9

Second iteration over backend.BlockdevWipe · da63bb4e

René Nussbaumer authored 14 years ago


This patch now uses dd entirely to wipe the disk, make it
much easier to wipe in blocks so we can give interactive feedback
about the status.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

da63bb4e

Oct 25, 2010

Simplify and extend the instance OS env · f2165b8a

Iustin Pop authored 14 years ago


Some parameters were missing (uuid, c/mtime). We simplify the export
method; unfortunately we cannot simply iterate over __slots__ since the
mapping is not 1:1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f2165b8a

Fix QA mixup of node/instance tests · 729c4377

Iustin Pop authored 14 years ago

There are two node tests that are run from RunCommonInstanceTests, which is the
bad place—it causes these node tests to be run three times instead of once.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

729c4377

Oct 22, 2010

ConfigWriter: prevent using a foreign config · eb180fe2

Iustin Pop authored 14 years ago


If the configuration file doesn't denote this node as master, we prevent
startup. This would have detected our previous race condition more
easily, hence we add it as a permanent check.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

eb180fe2

Fix bootstrap.MasterFailover race with watcher · 21004460

Iustin Pop authored 14 years ago


This fixes a recently diagnosed race condition between master failover
and the watcher.

Currently, the master failover first stops the master daemon, checks
that the IP is no longer reachable, and then distributes the updated
configuration. Between the stop and the distribution, it can happen that
the watcher starts the master daemon on the old node again, since ssconf
still points the master to it (and all nodes vote so).

In even more weird cases, the master daemon starts and before it manages
to open the configuration file, it is updated, which means the master
will respond to QueryClusterInfo with another node as the real master.

This patch reorders the actions during master failover:

- first, we redistribute a fixed config; this means the old master will
  refuse to update its own config file and ssconf, and that most jobs
  that change state will fail to finish
- we then immediately kill it; after this step, the watcher will be
  unable to start it, since the master will refuse startup
- and only then we check for IP reachability, etc.

I've tested the new version against concurrent launch of the watcher;
while my tests are not very exhaustive, two things can happen: watcher
see the daemons as dead, and tries to restart them, which also fail; or
it simply get an error while reading from the master daemon. Both these
should be OK.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

21004460

ConfigWriter: protect against multiple writers · bd407597

Iustin Pop authored 14 years ago


This should fix the case where there are two masters that both try to
distribute the configuration file to the cluster. The first one that does so,
will "win" the ownership of the config.data.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bd407597

backend.Upload: switch to utils.SafeWriteFile · 8f065ae2

Iustin Pop authored 14 years ago


This allows serialization of updates to a given file, with respect to
other cooperating writers.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8f065ae2

Add a "safe" file wrapper over WriteFile · 4138d39f

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4138d39f

Add functions to read and compare file 'ID's · 9e100285

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9e100285

LUSetInstanceParams: Remove unused attribute · 574d1b7b

Michael Hanselmann authored 14 years ago


“os_new” is not used anywhere, removing it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

574d1b7b

Adding backend method to wipe a block device · 69dd363f

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

69dd363f

Allow to specify wipe command and flags at configure time · 6e991d0e

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6e991d0e

Fix typo introduced in · edb8b377

Iustin Pop authored 14 years ago


Commit 8d8c4eff broke instance reinstall with different OS, due to an
attribute typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

edb8b377

Oct 21, 2010

Adjust the error message of setup-ssh if join check fails · e389d95b

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e389d95b

Fix clearing of the default iallocator · e725bee0

Iustin Pop authored 14 years ago


And also update the man page.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e725bee0

gnt-instance reinstall: Allow overriding OS parameters · 8d8c4eff

Michael Hanselmann authored 14 years ago


This allows OS installation scripts to make use of special parameters,
e.g. to retain some data on reinstallation.

The RAPI resource is not updated as it takes all parameters via the
query string and encoding arbitrary data in a query string is tricky.
The resource will need to be changed to use the POST body instead.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8d8c4eff

Oct 20, 2010

Add option to ignore offline node on instance start/stop · b44bd844

Michael Hanselmann authored 14 years ago


In some cases it can be useful to mark as an instance as started
or stopped while its primary node is offline. With this patch,
a new option, “--ignore-offline”, is introduced to “gnt-instance
start” and “… stop”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b44bd844

utils: Add function to find items in dictionary using regex · 691c81b7

Michael Hanselmann authored 14 years ago


This basically extracts a small piece of code from ganeti-rapi and puts
it into a utility function. RAPI resources are found using a dictionary
in which the keys can either be static strings or compiled regular
expressions. This might be handy in other places, hence extracting it
and adding unittests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

691c81b7

QA RAPI: Test HTTP 404 and 501 · 4d2bd00a

Michael Hanselmann authored 14 years ago


This tests the HTTP Not Found and Not Implemented errors.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

4d2bd00a

QA: Add test for “gnt-node modify” · d0cb68cb

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

d0cb68cb

Oct 19, 2010

Let gnt-cluster support prealloc_wipe_disks · b18ecea2

René Nussbaumer authored 14 years ago


This includes a new option gnt-cluster init and approriate output
on gnt-cluster info. Though gnt-cluster modify is not yet prepared.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b18ecea2

Merge branch 'devel-2.2' · b02e3172

Michael Hanselmann authored 14 years ago


* devel-2.2:
  Bump version to 2.2.1, update NEWS

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b02e3172

Bump version to 2.2.1, update NEWS · dcb95afb

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

dcb95afb

Oct 15, 2010

Merge branch 'devel-2.2' · 25c45709

Michael Hanselmann authored 14 years ago


* devel-2.2:
  http.client: Disable SSL session ID cache
  Crude workaround for pylint breakage

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

25c45709

http.client: Disable SSL session ID cache · 4ba4fe14

Apollon Oikonomopoulos authored 14 years ago


This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4ba4fe14

http.client: Disable SSL session ID cache · 7b70d7a8

Apollon Oikonomopoulos authored 14 years ago


This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7b70d7a8

Crude workaround for pylint breakage · f1763373

Iustin Pop authored 14 years ago


The way we currently call pylint, the exact order it inspect modules in
lib/http/ depends on the filesystem order. This is not good, and if
lib/http/server.py is loaded before lib/http/__init__.py, it will throw
a "R0921:763:HttpMessageReader: Abstract class not referenced" (as that
class is used in server.py).

For the short-term fix, we just add server.py after "ganeti", so that
it gets parsed (again?) and pylint sees the usage of the class.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f1763373

http.auth: Fix docstring error · c6e7edb8

Michael Hanselmann authored 14 years ago


This was missing from commit 2287b920.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c6e7edb8

devnotes.rst: Remove hardcoded Python version · b3a8bebf

Michael Hanselmann authored 14 years ago


Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b3a8bebf

Oct 14, 2010

Merge branch 'stable-2.2' · 744061f3

Iustin Pop authored 14 years ago


* stable-2.2:
  Release 2.2.1~rc1
  Require aclocal 1.11.1 or above for devel/release
  Revert "Require aclocal 1.11.1 or above for autogen.sh"
  Add mising --units in gnt-instance list man page
  Set list of trusted SSL CAs for client to verify
  Require aclocal 1.11.1 or above for autogen.sh

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

744061f3

Brown-bag fix for leftover comment · 76917d97

Iustin Pop authored 14 years ago


I did forgot this in the original patch. Sorry!!!!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

76917d97

Rework QA interaction with the watcher · 8201b996

Iustin Pop authored 14 years ago


The interaction with cron-launched watcher is a well-known failure mode of QA:

---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance

For the following tests it's recommended to turn off the ganeti-watcher cronjob.

---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher
…
Error: Domain 'instance1' does not exist.
Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes
  -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d'
2010-10-13 23:55:04,479:  pid=1659 ganeti-watcher:626
 ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked
---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher

In order to fix this, we disable the watcher during these tests, and
re-enable it afterwards. To protect against watcher being disabled, we
enable it unconditionally at the start of the QA (we do want it enabled,
in order to see the interaction between the watcher and many
creation/disk replace jobs, etc.).

Note: even after this patch, if a cron-watcher was started and is still
running during the test, we'll have locking issues. I think for now this
is OK, we'll have to see how often that happens.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8201b996

Add a new watcher option --ignore-pause · 46c8a6ab

Iustin Pop authored 14 years ago


During cluster maintenance, when the watcher is disabled, it's useful to
run it just once. This is incovenient to do currently, as the watcher
needs to be unpaused, then run, then paused again.

This patch adds an option “--ignore-pause” that can be used to ignore
the cluster-level setting. Also the man page is updated as it was
missing the options available.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

46c8a6ab

Release 2.2.1~rc1 · 24440be4

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

24440be4