- 29 May, 2012 1 commit
-
-
Iustin Pop authored
These were using exactly 80 chars, and I like them smaller. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 06 Jan, 2012 1 commit
-
-
Michael Hanselmann authored
“FileStatHelper” can be used together with “ReadFile” to a file's status while it's opened. This avoids certain race conditions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 17 Nov, 2011 1 commit
-
-
Iustin Pop authored
If confd is disabled, do not automatically restart it. Furthermore, we can't run maintenance actions if it is disabled so log a warning. Note that I haven't completely disabled the NodeMaintenance class with ENABLE_CONFD = False because I think they are at two different levels (e.g. we might have other maintenance actions done even with confd disabled). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 12 Oct, 2011 1 commit
-
-
Iustin Pop authored
We currently use 'filter' as the OpCode, QueryRequest and RAPI field name for representing a query filter. However, since 'filter' is a built-in function, we actually have to use filter_ throughout the code in order to not override the built-in function. This patch simply goes and does a global sed over the code. Due to the fact that the RAPI interface already exposed this field, we add compatibility code for now which handles both forms. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 30 Aug, 2011 2 commits
-
-
Andrea Spadaccini authored
Running pylint 0.24.0 revealed 2 errors and 1 warning. Here is how I fixed them: * jqueue.py: silenced E1101 * netutils.py: rewrote the list comprehension using extend() * watcher/__init__.py: fixed a missing format string parameter These changes are backwards-compatible with pylint 0.21.1. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Andrea Spadaccini authored
In version 0.21, pylint unified all the disable-* (and enable-*) directives to disable (resp. enable). This leads to a lot of DeprecationWarning being emitted even if one uses the recommended version of pylint (0.21.1, as stated in devnotes.rst). This commit changes all the disable-msg directives to disable. Signed-off-by:
Andrea Spadaccini <spadaccio@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 22 Aug, 2011 1 commit
-
-
Michael Hanselmann authored
This patch retains the behaviour of ganeti-watcher in previous Ganeti versions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 12 Aug, 2011 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 05 Aug, 2011 2 commits
-
-
Michael Hanselmann authored
The first argument to str.split is the separator, not the maximum number of splits. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Each per-group watcher process writes its own instance status file. Once that's done it tries to acquire an exclusive lock on the global file and will proceed to read all status file, merging them based on each file's mtime. If an instance is moved to another group, the newer status will supersede that of an older file which hasn't yet been updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 04 Aug, 2011 1 commit
-
-
Michael Hanselmann authored
This patch brings a huge change to ganeti-watcher to make it aware of node groups. Each node group is processed in its own subprocess, reducing the impact of long-running operations. The global watcher state file, $datadir/ganeti/watcher.data, is replaced with a state file per node group ($datadir/ganeti/watcher.${uuid}.data). Previously a lock on the state file was used to ensure only one instance of watcher was running at the same time. Some operations, e.g. “gnt-cluster renew-crypto”, blocked the watcher by acquiring an exclusive lock on the state file. Since the watcher processes now use different files, this method is no longer usable. Locking multiple files isn't atomic. Instead a dedicated lock file is used and every watcher process acquires a shared lock on it. If a Ganeti command wants to block the watcher it acquires the lock in exclusive mode. Each per-nodegroup watcher process also acquires an exclusive lock on its state file. This prevents multiple watchers from running for the same nodegroup. The code is reorganized heavily to clear up dependencies between functions and to get rid of the global “client” variable. The utility class “Watcher” is removed in favour of stand-alone utility functions. Since the parent watcher process won't wait for its children by default, a new option (--wait-children) was added. It is used, for example, by QA. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 29 Jul, 2011 7 commits
-
-
Michael Hanselmann authored
For now this will do another query to the master daemon, but with the split for node groups this issue will go away. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Until now the state class would receive instances as objects (ganeti.watcher.Instance), but this is not necessary. By using strings the interface is simplified. This patch also simplifies some code accessing the internal structures, e.g. setting a key of a dictionary. Some instances of “del dict[key]” are replaced with “dict.pop(key, None)” to suppress any exceptions if the key doesn't exist. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also, remove punctuation from one error message. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Make them match with style guide. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
“upfile” is a bad name. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 28 Jul, 2011 1 commit
-
-
Michael Hanselmann authored
The node maintenance class is standalone. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 26 Jul, 2011 1 commit
-
-
Michael Hanselmann authored
Until now verifying disks, which is also used by the watcher, would lock all nodes and instances. With this patch the opcode is changed to operate on per nodegroup, requiring fewer locks. Both “gnt-cluster” and “ganeti-watcher” are changed for the new interface. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 19 Apr, 2011 1 commit
-
-
Michael Hanselmann authored
If “utils.RunParts” were to raise an exception, a log message was written and the code continued to run. Due to the exception the “results” variable would not be defined. Also change the code to log a backtrace (getting an exception is rather unlikely and having a backtrace is useful) and update one comment. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 24 Mar, 2011 1 commit
-
-
Iustin Pop authored
Add some debug logging to detail why we don't run some steps. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 17 Mar, 2011 1 commit
-
-
Michael Hanselmann authored
When “ganeti-watcher” is called with an argument, it would hint at a non-existing “-f” parameter. With this patch the separate usage string is no longer necessary. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 23 Feb, 2011 1 commit
-
-
Michael Hanselmann authored
They've been hardcoded for too long. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 02 Feb, 2011 1 commit
-
-
Michael Hanselmann authored
It's passed in by most users (daemons, CLI scripts) and for the others (burnin, watcher) it certainly doesn't hurt, especially when using syslog. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 27 Jan, 2011 1 commit
-
-
René Nussbaumer authored
In cases where secondary was offline and not evacuated watcher tried to activate-disks in an endless manner, but this is useless, as the secondary is offline and therefore not responding to this approach. This patch skips activation of the disk if the secondary is bad but instance up and running. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 18 Jan, 2011 5 commits
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 29 Oct, 2010 1 commit
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 14 Oct, 2010 1 commit
-
-
Iustin Pop authored
During cluster maintenance, when the watcher is disabled, it's useful to run it just once. This is incovenient to do currently, as the watcher needs to be unpaused, then run, then paused again. This patch adds an option “--ignore-pause” that can be used to ignore the cluster-level setting. Also the man page is updated as it was missing the options available. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 02 Sep, 2010 1 commit
-
-
Iustin Pop authored
Since the RAPI certificate is not necessarily self-signed, and we currently don't have any configuration variable for the real CA file, we disable for now the CA checks. This fixes the 'restart RAPI every 5 minutes' problem with non-self-signed certs. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- 18 Aug, 2010 1 commit
-
-
Manuel Franceschini authored
This patch enables IPv6 name resolution by using socket.getaddrinfo instead of socket.gethostbyname_ex. It renames the HostInfo class to Hostname and unifies its use throughout the code. This is achieved by using static calls where no object is needed and removes some obsolete code. For now, we just resolve to IPv4 addresses, but this will change once it is needed. Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 26 Jul, 2010 1 commit
-
-
Iustin Pop authored
This patch implements a few changes to the instance handling. First, old instances which no longer exist on the cluster are removed from the state file, to keep things clean. Second, the instance restart counters are reset every 8 hours, since some error cases might be transient (e.g. networking issues, or machine temporarily down), and if the problem takes more than 5 restarts but is not permanent, watcher will not restart the instance. The value of 8 hours is, I think, both conservative (as not to hammer the cluster too often with restarts) and fast enough to clear semi-transient problems. And last, if an instance is not restarted due to exhausted retries, this should be warned, otherwise it's hard to understand why watcher doesn't want to restart an ERROR_down instance. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- 09 Jul, 2010 1 commit
-
-
Manuel Franceschini authored
This patch moves network utility functions to a dedicated module. Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 01 Jul, 2010 1 commit
-
-
Michael Hanselmann authored
Currently the RAPI client uses the urllib2 and httplib modules from Python's standard library. They're used with pyOpenSSL in a very fragile way, and there are known issues when receiving large responses from a RAPI server. By switching to PycURL we leverage the power and stability of the widely-used curl library (libcurl). This brings us much more flexibility than before, and timeouts were easily implemented (something that would have involved a lot of work with the built-in modules). There's one small drawback: Programs using libcurl have to call curl_global_init(3) (available as pycurl.global_init) while exactly one thread is running (e.g. before other threads) and are supposed to call curl_global_cleanup(3) (available as pycurl.global_cleanup) upon exiting. See the manpages for details. A decorator is provided to simplify this. Unittests for the new code are provided, increasing the test coverage of the RAPI client from 74% to 89%. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- 30 Jun, 2010 1 commit
-
-
Manuel Franceschini authored
Signed-off-by:
Manuel Franceschini <livewire@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- 03 Jun, 2010 1 commit
-
-
Tom Limoncelli authored
Update ganeti-watcher so that it tests the master's RAPI port with a simple test (in this case GetVersion). If it fails, make one attempt at restarting ganeti-rapi and retest. - daemons/ganeti-watcher: Test rapi and make one attempt at restarting it. - lib/utils.py: add StopDaemon() function. Signed-off-by:
Tom Limoncelli <tlim@google.com> Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-