Commits · 271b7cf9979c3ef6eb67dbd43a619ba18c7a62c0 · itminedu / snf-ganeti

Oct 26, 2010

Adding RPC call for blockdev_wipe · 271b7cf9

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

271b7cf9

Second iteration over backend.BlockdevWipe · da63bb4e

René Nussbaumer authored 14 years ago


This patch now uses dd entirely to wipe the disk, make it
much easier to wipe in blocks so we can give interactive feedback
about the status.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

da63bb4e

Oct 25, 2010

Simplify and extend the instance OS env · f2165b8a

Iustin Pop authored 14 years ago


Some parameters were missing (uuid, c/mtime). We simplify the export
method; unfortunately we cannot simply iterate over __slots__ since the
mapping is not 1:1.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

f2165b8a

Oct 22, 2010

ConfigWriter: prevent using a foreign config · eb180fe2

Iustin Pop authored 14 years ago


If the configuration file doesn't denote this node as master, we prevent
startup. This would have detected our previous race condition more
easily, hence we add it as a permanent check.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

eb180fe2

Fix bootstrap.MasterFailover race with watcher · 21004460

Iustin Pop authored 14 years ago


This fixes a recently diagnosed race condition between master failover
and the watcher.

Currently, the master failover first stops the master daemon, checks
that the IP is no longer reachable, and then distributes the updated
configuration. Between the stop and the distribution, it can happen that
the watcher starts the master daemon on the old node again, since ssconf
still points the master to it (and all nodes vote so).

In even more weird cases, the master daemon starts and before it manages
to open the configuration file, it is updated, which means the master
will respond to QueryClusterInfo with another node as the real master.

This patch reorders the actions during master failover:

- first, we redistribute a fixed config; this means the old master will
  refuse to update its own config file and ssconf, and that most jobs
  that change state will fail to finish
- we then immediately kill it; after this step, the watcher will be
  unable to start it, since the master will refuse startup
- and only then we check for IP reachability, etc.

I've tested the new version against concurrent launch of the watcher;
while my tests are not very exhaustive, two things can happen: watcher
see the daemons as dead, and tries to restart them, which also fail; or
it simply get an error while reading from the master daemon. Both these
should be OK.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

21004460

ConfigWriter: protect against multiple writers · bd407597

Iustin Pop authored 14 years ago


This should fix the case where there are two masters that both try to
distribute the configuration file to the cluster. The first one that does so,
will "win" the ownership of the config.data.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bd407597

backend.Upload: switch to utils.SafeWriteFile · 8f065ae2

Iustin Pop authored 14 years ago


This allows serialization of updates to a given file, with respect to
other cooperating writers.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8f065ae2

Add a "safe" file wrapper over WriteFile · 4138d39f

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4138d39f

Add functions to read and compare file 'ID's · 9e100285

Iustin Pop authored 14 years ago


Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

9e100285

LUSetInstanceParams: Remove unused attribute · 574d1b7b

Michael Hanselmann authored 14 years ago


“os_new” is not used anywhere, removing it.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

574d1b7b

Adding backend method to wipe a block device · 69dd363f

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

69dd363f

Allow to specify wipe command and flags at configure time · 6e991d0e

René Nussbaumer authored 14 years ago


Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

6e991d0e

Fix typo introduced in · edb8b377

Iustin Pop authored 14 years ago


Commit 8d8c4eff broke instance reinstall with different OS, due to an
attribute typo.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

edb8b377

Oct 21, 2010

Fix clearing of the default iallocator · e725bee0

Iustin Pop authored 14 years ago


And also update the man page.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

e725bee0

gnt-instance reinstall: Allow overriding OS parameters · 8d8c4eff

Michael Hanselmann authored 14 years ago


This allows OS installation scripts to make use of special parameters,
e.g. to retain some data on reinstallation.

The RAPI resource is not updated as it takes all parameters via the
query string and encoding arbitrary data in a query string is tricky.
The resource will need to be changed to use the POST body instead.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

8d8c4eff

Oct 20, 2010

Add option to ignore offline node on instance start/stop · b44bd844

Michael Hanselmann authored 14 years ago


In some cases it can be useful to mark as an instance as started
or stopped while its primary node is offline. With this patch,
a new option, “--ignore-offline”, is introduced to “gnt-instance
start” and “… stop”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b44bd844

utils: Add function to find items in dictionary using regex · 691c81b7

Michael Hanselmann authored 14 years ago


This basically extracts a small piece of code from ganeti-rapi and puts
it into a utility function. RAPI resources are found using a dictionary
in which the keys can either be static strings or compiled regular
expressions. This might be handy in other places, hence extracting it
and adding unittests.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: René Nussbaumer <rn@google.com>

691c81b7

Oct 19, 2010

Let gnt-cluster support prealloc_wipe_disks · b18ecea2

René Nussbaumer authored 14 years ago


This includes a new option gnt-cluster init and approriate output
on gnt-cluster info. Though gnt-cluster modify is not yet prepared.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b18ecea2

Oct 15, 2010

http.client: Disable SSL session ID cache · 4ba4fe14

Apollon Oikonomopoulos authored 14 years ago


This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4ba4fe14

http.client: Disable SSL session ID cache · 7b70d7a8

Apollon Oikonomopoulos authored 14 years ago


This patch disables the SSL session ID cache for all cURL operations.
This is needed because http.HttpBase's PyOpenSSL implementation does not
currently set a context using SSL_set_session_id_context(3SSL), cURL
tries to re-use the session ID and, according to
SSL_set_session_id_context(3SSL):

 If the session id context is not set on an SSL/TLS server and client
 certificates are used, stored sessions will not be reused but a fatal
 error will be flagged and the handshake will fail.

Ideally, session caching should be either controlled, or disabled in
HttpBase, however PyOpenSSL does not seem to implement
SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which
are used for these purposes (it seems that only M2Crypto's SSL module
supports these).

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

7b70d7a8

http.auth: Fix docstring error · c6e7edb8

Michael Hanselmann authored 14 years ago


This was missing from commit 2287b920.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

c6e7edb8

Oct 13, 2010

Fix compatibility with Pyinotify 0.8 · ac96953d

Michael Hanselmann authored 14 years ago


I didn't know why the code previously used
“pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from
“pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has them
in “pyinotify”, not “pyinotify.EventsCodes”.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

ac96953d

Extract base class from SingleFileEventHandler · e543a42f

Michael Hanselmann authored 14 years ago


The base class can contain code useful to other inotify users.
As it is “SingleFileEventHandler” can not be used in ganeti-rapi,
therefore it'll use its own small inotify handler class based
on this base class.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

e543a42f

http.auth.ReadPasswordFile: Don't read file directly · 2287b920

Michael Hanselmann authored 14 years ago


Reading the file before this function allows for better error
reporting.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

2287b920

Move the parameter types to their own module · 62e0e880

Iustin Pop authored 14 years ago


This is for cleanup, and for later reuse in other parts of the code
(outside of LUs).

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

62e0e880

"Fix" handling of old software versions on startup · 4b63dc7a

Iustin Pop authored 14 years ago


Currently, masterd startup with old software versions is very confusing
for users: we present two tracebacks, with a message in the middle about
"version mismatch". This can lead to users believing that all that needs
to be done is to fix the config file.

This patch attempts to improve this by handling this case in masterd
itself (not in the child), and showing a more friendly message for this
case.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

4b63dc7a

Export more information via LUQueryInstances/RAPI · 90224407

Iustin Pop authored 14 years ago


Currently, the custom instance parameters (hv, be, nicp) are only
queryable via LUQueryInstanceData. LUQueryInstance returns only the
filled parameters, thus its users (especially RAPI) have no way to know
if a parameter is custom or the default value.

This patch adds three new parameters: custom_hvparams, custom_beparams,
custom_nicparams, that are also exported in RAPI.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

90224407

Oct 12, 2010

Set list of trusted SSL CAs for client to verify · 2d93a6a7

Apollon Oikonomopoulos authored 14 years ago

As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAs
advertised to SSL clients to include the server's own certificate. This
evidently fixes the pycurl/gnutls RPC client.

During the TLS Handshake, when client verification is requested, the
Server sends a CertificateRequest message which states that the client
should send a valid certificate as a response. The CertificateRequest
message contains a section called "certificate_authorities", which,
according to the standard, is a list of the Distinguished Names (DNs) of
acceptable certification authorities. The client uses this list to send
a certificate signed by one of the acceptable CAs.

Under OpenSSL's server implementation, this list must be set manually
using some appropriate call, otherwise the list is empty. TLS 1.0[1]
does not state whether the list may be left blank, whereas TLS 1.1[2]
and 1.2[3] state that in case the list is blank, then the client *may*
send any certificate of a valid type (valid types are specified
elsewhere in the handshake).

OpenSSL clients seem to obey the behaviour specified in TLS 1.1+,
whereas at least curl+GnuTLS does not send any certificates if the list
is empty (which is not wrong per the spec, but also evidently not
configurable).

[1] http://tools.ietf.org/html/rfc2246
[2] http://tools.ietf.org/html/rfc4346
[3] http://tools.ietf.org/html/rfc5246



Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Reviewed-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Guido Trotter <ultrotter@google.com>

2d93a6a7

Show instance state in instance console failures · bd631b02

Iustin Pop authored 14 years ago


The current message is not entirely clear, as it doesn't show the reason
why the instance is not running.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

bd631b02

Fix epydoc errors · 614244bd

Iustin Pop authored 14 years ago


And sorry!

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

614244bd

jqueue: Fix bug when cancelling jobs · 9e49dfc5

Michael Hanselmann authored 14 years ago


If a job was cancelled while it was waiting for locks, an assertion
would've failed. This patch fixes the problem and provides a unit
test to check for this situation.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

9e49dfc5

jqueue: Resume jobs from “waitlock” status (2nd try) · 320d1daf

Michael Hanselmann authored 14 years ago


Commit 5ef699a0 had to roll back an earlier attempt at implementing
this. With the improved job queue processer, this is finally possible.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

320d1daf

jqueue/gnt-job: Add job priority fields for display · b8802cc4

Michael Hanselmann authored 14 years ago


These fields can help with debugging.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

b8802cc4

mcpu: Raise directly in _AcquireLocks · 900df6cd

Michael Hanselmann authored 14 years ago


Removes code duplication.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

900df6cd

Add prealloc_wipe_disks as a cluster-wide configuration variable · 3d914585

René Nussbaumer authored 14 years ago


This is the first step for the support of wiping block devices prior
to creation of the instance.

Signed-off-by: René Nussbaumer <rn@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

3d914585

Oct 11, 2010

RPC: disable curl's Expect header · 8e29563f

Iustin Pop authored 14 years ago


This patch solves the very slow (~8-9 seconds) gnt-instance modify
behaviour. Well, it solves in general the slow RPC behaviour, but it was
most visible in that LU.

It seems that curl's behaviour with regard to file uploads (via PUT) and
the 'Expect' header are interacting badly with our http server.

First, our http server doesn't properly handle this header. According to
RFC 2616:

  Requirements for HTTP/1.1 origin servers: Upon receiving a request
  which includes an Expect request-header field with the "100-continue"
  expectation, an origin server MUST either respond with 100 (Continue)
  status and continue to read from the input stream, or respond with a
  final status code.

Our server doesn't do this, and hence it triggers this behaviour in curl
(from the curl FAQ):

  4.16 My HTTP POST or PUT requests are slow!

  libcurl makes all POST and PUT requests (except for POST requests with a
  very tiny request body) use the "Expect: 100-continue" header. This header
  allows the server to deny the operation early so that libcurl can bail out
  already before having to send any data. This is useful in authentication
  cases and others.

  However, many servers don't implement the Expect: stuff properly and if the
  server doesn't respond (positively) within 1 second libcurl will continue
  and send off the data anyway.

  You can disable libcurl's use of the Expect: header the same way you disable
  any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.

This behaviour was detected by watching the captured traffic (in non-SSL
mode), where between the initial HTTP headers (ending with the Expect
one), there was a ~1-2 second pause until curl was sending the body.
Properly RTFM-ing would have saved ~1 day of digging around, but hey…

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

8e29563f

Oct 07, 2010

jqueue, CancelJob: Check status only once per call · 86b16e9d

Michael Hanselmann authored 14 years ago


This simplifies the code a bit--the status is only checked once.

Signed-off-by: Michael Hanselmann <hansmi@google.com>
Reviewed-by: Iustin Pop <iustin@google.com>

86b16e9d

Fix a rare bug in StartDaemonChild and GenericMain · ed3920e3

Iustin Pop authored 14 years ago


I've seen cases where the result from str(sys.exc_info()[1]) is ""; this
breaks the error reporting as the parent relies on non-empty error
messages to properly detect child status (otherwise it will try to read
the pid and fail, so on).

While this was always in case of asserts, we need to ensure this doesn't
happen. Therefore we abstract this functionality (writing the error
message) and ensure we write a non-empty string in the new function.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

ed3920e3

Enhance the error reporting · 3e87c1bf

Iustin Pop authored 14 years ago


Since daemon startup error will be often related to socket errors, so it
makes sense to change the original reporting:

  Error when starting daemon process: "(98, 'Address already in use')"

Into:

  Error when starting daemon process: 'Socket-related error: Address
  already in use (errno=98)'

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

3e87c1bf

Change daemon.GenericMain/utils.Daemonize workflow · b78aa8c2

Iustin Pop authored 14 years ago


This patch copies the pipe-based error reporting functionality from
utils.StartDaemon (I gave up for now on tryin to merge the two).

This patch will fix two longstanding bugs:

- if we fork, we lose all error reporting from the child to the original
  parent
- if we fork, the original parent exits before the child is ready to
  "work" (whatever the work might be)

Both these are fixed once the users of daemon.GenericMain are converted
to the three-state setup, as we'll get error reporting via the pipe and
also not exit until the PrepFn is done.

Signed-off-by: Iustin Pop <iustin@google.com>
Reviewed-by: Michael Hanselmann <hansmi@google.com>

b78aa8c2