- Oct 26, 2010
-
-
Iustin Pop authored
In the context of a node, its group has (at least today) only one meaning, that is the node's node group. As such, we rename node.nodegroup to just node.group. Note: if we want to keep node in there, it should be at least node_group, for consistency with the other node attributes. Similarly, we rename the OpAddNode nodegroup attribute to group. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
For consistency with other CLI options. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The node and instance computations were all in this big function; we separate them out for more clarity. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
This patch now uses dd entirely to wipe the disk, make it much easier to wipe in blocks so we can give interactive feedback about the status. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 25, 2010
-
-
Iustin Pop authored
Some parameters were missing (uuid, c/mtime). We simplify the export method; unfortunately we cannot simply iterate over __slots__ since the mapping is not 1:1. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
There are two node tests that are run from RunCommonInstanceTests, which is the bad place—it causes these node tests to be run three times instead of once. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Oct 22, 2010
-
-
Iustin Pop authored
If the configuration file doesn't denote this node as master, we prevent startup. This would have detected our previous race condition more easily, hence we add it as a permanent check. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This fixes a recently diagnosed race condition between master failover and the watcher. Currently, the master failover first stops the master daemon, checks that the IP is no longer reachable, and then distributes the updated configuration. Between the stop and the distribution, it can happen that the watcher starts the master daemon on the old node again, since ssconf still points the master to it (and all nodes vote so). In even more weird cases, the master daemon starts and before it manages to open the configuration file, it is updated, which means the master will respond to QueryClusterInfo with another node as the real master. This patch reorders the actions during master failover: - first, we redistribute a fixed config; this means the old master will refuse to update its own config file and ssconf, and that most jobs that change state will fail to finish - we then immediately kill it; after this step, the watcher will be unable to start it, since the master will refuse startup - and only then we check for IP reachability, etc. I've tested the new version against concurrent launch of the watcher; while my tests are not very exhaustive, two things can happen: watcher see the daemons as dead, and tries to restart them, which also fail; or it simply get an error while reading from the master daemon. Both these should be OK. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This should fix the case where there are two masters that both try to distribute the configuration file to the cluster. The first one that does so, will "win" the ownership of the config.data. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This allows serialization of updates to a given file, with respect to other cooperating writers. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
“os_new” is not used anywhere, removing it. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Commit 8d8c4eff broke instance reinstall with different OS, due to an attribute typo. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Oct 21, 2010
-
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
And also update the man page. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This allows OS installation scripts to make use of special parameters, e.g. to retain some data on reinstallation. The RAPI resource is not updated as it takes all parameters via the query string and encoding arbitrary data in a query string is tricky. The resource will need to be changed to use the POST body instead. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 20, 2010
-
-
Michael Hanselmann authored
In some cases it can be useful to mark as an instance as started or stopped while its primary node is offline. With this patch, a new option, “--ignore-offline”, is introduced to “gnt-instance start” and “… stop”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This basically extracts a small piece of code from ganeti-rapi and puts it into a utility function. RAPI resources are found using a dictionary in which the keys can either be static strings or compiled regular expressions. This might be handy in other places, hence extracting it and adding unittests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This tests the HTTP Not Found and Not Implemented errors. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 19, 2010
-
-
René Nussbaumer authored
This includes a new option gnt-cluster init and approriate output on gnt-cluster info. Though gnt-cluster modify is not yet prepared. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* devel-2.2: Bump version to 2.2.1, update NEWS Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 15, 2010
-
-
Michael Hanselmann authored
* devel-2.2: http.client: Disable SSL session ID cache Crude workaround for pylint breakage Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Apollon Oikonomopoulos authored
This patch disables the SSL session ID cache for all cURL operations. This is needed because http.HttpBase's PyOpenSSL implementation does not currently set a context using SSL_set_session_id_context(3SSL), cURL tries to re-use the session ID and, according to SSL_set_session_id_context(3SSL): If the session id context is not set on an SSL/TLS server and client certificates are used, stored sessions will not be reused but a fatal error will be flagged and the handshake will fail. Ideally, session caching should be either controlled, or disabled in HttpBase, however PyOpenSSL does not seem to implement SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which are used for these purposes (it seems that only M2Crypto's SSL module supports these). Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Apollon Oikonomopoulos authored
This patch disables the SSL session ID cache for all cURL operations. This is needed because http.HttpBase's PyOpenSSL implementation does not currently set a context using SSL_set_session_id_context(3SSL), cURL tries to re-use the session ID and, according to SSL_set_session_id_context(3SSL): If the session id context is not set on an SSL/TLS server and client certificates are used, stored sessions will not be reused but a fatal error will be flagged and the handshake will fail. Ideally, session caching should be either controlled, or disabled in HttpBase, however PyOpenSSL does not seem to implement SSL_CTX_set_session_cache_mode nor SSL_CTX_set_session_id_context which are used for these purposes (it seems that only M2Crypto's SSL module supports these). Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
The way we currently call pylint, the exact order it inspect modules in lib/http/ depends on the filesystem order. This is not good, and if lib/http/server.py is loaded before lib/http/__init__.py, it will throw a "R0921:763:HttpMessageReader: Abstract class not referenced" (as that class is used in server.py). For the short-term fix, we just add server.py after "ganeti", so that it gets parsed (again?) and pylint sees the usage of the class. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
This was missing from commit 2287b920. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Oct 14, 2010
-
-
Iustin Pop authored
* stable-2.2: Release 2.2.1~rc1 Require aclocal 1.11.1 or above for devel/release Revert "Require aclocal 1.11.1 or above for autogen.sh" Add mising --units in gnt-instance list man page Set list of trusted SSL CAs for client to verify Require aclocal 1.11.1 or above for autogen.sh Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
I did forgot this in the original patch. Sorry!!!! Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
The interaction with cron-launched watcher is a well-known failure mode of QA: ---- 2010-10-14 06:54:55.464839 time=0:00:56.764827 Test tools/move-instance For the following tests it's recommended to turn off the ganeti-watcher cronjob. ---- 2010-10-14 06:54:55.465255 start Test automatic restart of instance by ganeti-watcher … Error: Domain 'instance1' does not exist. Command: ssh -oEscapeChar=none -oBatchMode=yes -l root -t -oStrictHostKeyChecking=yes -oClearAllForwardings=yes -oForwardAgent=yes node2 'ganeti-watcher -d' 2010-10-13 23:55:04,479: pid=1659 ganeti-watcher:626 ERROR Can't acquire lock on state file /var/lib/ganeti/watcher.data: File already locked ---- 2010-10-14 06:55:04.513948 time=0:00:09.048693 Test automatic restart of instance by ganeti-watcher In order to fix this, we disable the watcher during these tests, and re-enable it afterwards. To protect against watcher being disabled, we enable it unconditionally at the start of the QA (we do want it enabled, in order to see the interaction between the watcher and many creation/disk replace jobs, etc.). Note: even after this patch, if a cron-watcher was started and is still running during the test, we'll have locking issues. I think for now this is OK, we'll have to see how often that happens. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
During cluster maintenance, when the watcher is disabled, it's useful to run it just once. This is incovenient to do currently, as the watcher needs to be unpaused, then run, then paused again. This patch adds an option “--ignore-pause” that can be used to ignore the cluster-level setting. Also the man page is updated as it was missing the options available. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-