- Oct 26, 2010
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
This patch now uses dd entirely to wipe the disk, make it much easier to wipe in blocks so we can give interactive feedback about the status. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 25, 2010
-
-
Iustin Pop authored
Some parameters were missing (uuid, c/mtime). We simplify the export method; unfortunately we cannot simply iterate over __slots__ since the mapping is not 1:1. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 22, 2010
-
-
Iustin Pop authored
If the configuration file doesn't denote this node as master, we prevent startup. This would have detected our previous race condition more easily, hence we add it as a permanent check. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This fixes a recently diagnosed race condition between master failover and the watcher. Currently, the master failover first stops the master daemon, checks that the IP is no longer reachable, and then distributes the updated configuration. Between the stop and the distribution, it can happen that the watcher starts the master daemon on the old node again, since ssconf still points the master to it (and all nodes vote so). In even more weird cases, the master daemon starts and before it manages to open the configuration file, it is updated, which means the master will respond to QueryClusterInfo with another node as the real master. This patch reorders the actions during master failover: - first, we redistribute a fixed config; this means the old master will refuse to update its own config file and ssconf, and that most jobs that change state will fail to finish - we then immediately kill it; after this step, the watcher will be unable to start it, since the master will refuse startup - and only then we check for IP reachability, etc. I've tested the new version against concurrent launch of the watcher; while my tests are not very exhaustive, two things can happen: watcher see the daemons as dead, and tries to restart them, which also fail; or it simply get an error while reading from the master daemon. Both these should be OK. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This should fix the case where there are two masters that both try to distribute the configuration file to the cluster. The first one that does so, will "win" the ownership of the config.data. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This allows serialization of updates to a given file, with respect to other cooperating writers. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
“os_new” is not used anywhere, removing it. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Commit 8d8c4eff broke instance reinstall with different OS, due to an attribute typo. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Oct 21, 2010
-
-
Iustin Pop authored
And also update the man page. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This allows OS installation scripts to make use of special parameters, e.g. to retain some data on reinstallation. The RAPI resource is not updated as it takes all parameters via the query string and encoding arbitrary data in a query string is tricky. The resource will need to be changed to use the POST body instead. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 20, 2010
-
-
Michael Hanselmann authored
In some cases it can be useful to mark as an instance as started or stopped while its primary node is offline. With this patch, a new option, “--ignore-offline”, is introduced to “gnt-instance start” and “… stop”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This basically extracts a small piece of code from ganeti-rapi and puts it into a utility function. RAPI resources are found using a dictionary in which the keys can either be static strings or compiled regular expressions. This might be handy in other places, hence extracting it and adding unittests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Oct 19, 2010
-
-
René Nussbaumer authored
This includes a new option gnt-cluster init and approriate output on gnt-cluster info. Though gnt-cluster modify is not yet prepared. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 15, 2010
-
-
Apollon Oikonomopoulos authored
This patch disables the SSL session ID cache for all cURL operations. This is needed because http.HttpBase's PyOpenSSL implementation does not currently set a context using SSL_set_session_id_context(3SSL), cURL tries to re-use the session ID and, according to SSL_set_session_id_context(3SSL): If the session id context is not set on an SSL/TLS server and client certificates are used, stored sessions will not be reused but a fatal error will be flagged and the handshake will fail. Ideally, session caching should be either controlled, or disabled in HttpBase, however PyOpenSSL does not seem to implement SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which are used for these purposes (it seems that only M2Crypto's SSL module supports these). Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Apollon Oikonomopoulos authored
This patch disables the SSL session ID cache for all cURL operations. This is needed because http.HttpBase's PyOpenSSL implementation does not currently set a context using SSL_set_session_id_context(3SSL), cURL tries to re-use the session ID and, according to SSL_set_session_id_context(3SSL): If the session id context is not set on an SSL/TLS server and client certificates are used, stored sessions will not be reused but a fatal error will be flagged and the handshake will fail. Ideally, session caching should be either controlled, or disabled in HttpBase, however PyOpenSSL does not seem to implement SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which are used for these purposes (it seems that only M2Crypto's SSL module supports these). Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This was missing from commit 2287b920. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 13, 2010
-
-
Michael Hanselmann authored
I didn't know why the code previously used “pyinotify.EventsCodes.ALL_FLAGS” instead of using the flags from “pyinotify.EventsCodes” directly. Turns out that Pyinotify 0.8 has them in “pyinotify”, not “pyinotify.EventsCodes”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The base class can contain code useful to other inotify users. As it is “SingleFileEventHandler” can not be used in ganeti-rapi, therefore it'll use its own small inotify handler class based on this base class. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Reading the file before this function allows for better error reporting. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This is for cleanup, and for later reuse in other parts of the code (outside of LUs). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, masterd startup with old software versions is very confusing for users: we present two tracebacks, with a message in the middle about "version mismatch". This can lead to users believing that all that needs to be done is to fix the config file. This patch attempts to improve this by handling this case in masterd itself (not in the child), and showing a more friendly message for this case. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, the custom instance parameters (hv, be, nicp) are only queryable via LUQueryInstanceData. LUQueryInstance returns only the filled parameters, thus its users (especially RAPI) have no way to know if a parameter is custom or the default value. This patch adds three new parameters: custom_hvparams, custom_beparams, custom_nicparams, that are also exported in RAPI. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 12, 2010
-
-
Apollon Oikonomopoulos authored
As per SSL_CTX_set_client_CA_list(3SSL), set the list of acceptable CAs advertised to SSL clients to include the server's own certificate. This evidently fixes the pycurl/gnutls RPC client. During the TLS Handshake, when client verification is requested, the Server sends a CertificateRequest message which states that the client should send a valid certificate as a response. The CertificateRequest message contains a section called "certificate_authorities", which, according to the standard, is a list of the Distinguished Names (DNs) of acceptable certification authorities. The client uses this list to send a certificate signed by one of the acceptable CAs. Under OpenSSL's server implementation, this list must be set manually using some appropriate call, otherwise the list is empty. TLS 1.0[1] does not state whether the list may be left blank, whereas TLS 1.1[2] and 1.2[3] state that in case the list is blank, then the client *may* send any certificate of a valid type (valid types are specified elsewhere in the handshake). OpenSSL clients seem to obey the behaviour specified in TLS 1.1+, whereas at least curl+GnuTLS does not send any certificates if the list is empty (which is not wrong per the spec, but also evidently not configurable). [1] http://tools.ietf.org/html/rfc2246 [2] http://tools.ietf.org/html/rfc4346 [3] http://tools.ietf.org/html/rfc5246 Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The current message is not entirely clear, as it doesn't show the reason why the instance is not running. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
And sorry! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
If a job was cancelled while it was waiting for locks, an assertion would've failed. This patch fixes the problem and provides a unit test to check for this situation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Commit 5ef699a0 had to roll back an earlier attempt at implementing this. With the improved job queue processer, this is finally possible. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
These fields can help with debugging. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Removes code duplication. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
This is the first step for the support of wiping block devices prior to creation of the instance. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 11, 2010
-
-
Iustin Pop authored
This patch solves the very slow (~8-9 seconds) gnt-instance modify behaviour. Well, it solves in general the slow RPC behaviour, but it was most visible in that LU. It seems that curl's behaviour with regard to file uploads (via PUT) and the 'Expect' header are interacting badly with our http server. First, our http server doesn't properly handle this header. According to RFC 2616: Requirements for HTTP/1.1 origin servers: Upon receiving a request which includes an Expect request-header field with the "100-continue" expectation, an origin server MUST either respond with 100 (Continue) status and continue to read from the input stream, or respond with a final status code. Our server doesn't do this, and hence it triggers this behaviour in curl (from the curl FAQ): 4.16 My HTTP POST or PUT requests are slow! libcurl makes all POST and PUT requests (except for POST requests with a very tiny request body) use the "Expect: 100-continue" header. This header allows the server to deny the operation early so that libcurl can bail out already before having to send any data. This is useful in authentication cases and others. However, many servers don't implement the Expect: stuff properly and if the server doesn't respond (positively) within 1 second libcurl will continue and send off the data anyway. You can disable libcurl's use of the Expect: header the same way you disable any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0. This behaviour was detected by watching the captured traffic (in non-SSL mode), where between the initial HTTP headers (ending with the Expect one), there was a ~1-2 second pause until curl was sending the body. Properly RTFM-ing would have saved ~1 day of digging around, but hey… Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 07, 2010
-
-
Michael Hanselmann authored
This simplifies the code a bit--the status is only checked once. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
I've seen cases where the result from str(sys.exc_info()[1]) is ""; this breaks the error reporting as the parent relies on non-empty error messages to properly detect child status (otherwise it will try to read the pid and fail, so on). While this was always in case of asserts, we need to ensure this doesn't happen. Therefore we abstract this functionality (writing the error message) and ensure we write a non-empty string in the new function. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Since daemon startup error will be often related to socket errors, so it makes sense to change the original reporting: Error when starting daemon process: "(98, 'Address already in use')" Into: Error when starting daemon process: 'Socket-related error: Address already in use (errno=98)' Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch copies the pipe-based error reporting functionality from utils.StartDaemon (I gave up for now on tryin to merge the two). This patch will fix two longstanding bugs: - if we fork, we lose all error reporting from the child to the original parent - if we fork, the original parent exits before the child is ready to "work" (whatever the work might be) Both these are fixed once the users of daemon.GenericMain are converted to the three-state setup, as we'll get error reporting via the pipe and also not exit until the PrepFn is done. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-