- Dec 24, 2010
-
-
Iustin Pop authored
While the LU does return the final name, it's useful to log the actual DNS resolving process (input and output) in order to help with the diagnose of failures. The patch also fixes the docstring of the Exec() function. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Dec 21, 2010
-
-
Michael Hanselmann authored
The list of fields is not only sorted, but sorted in a nice way. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Long story short: time.strftime("%Z", time.localtime()) doesn't work, even though it's documented to be equivalent to time.strftime("%Z"). $ TZ=America/Sao_Paulo python -c 'import time; print time.strftime("%Z"), time.strftime("%Z", time.localtime())' BRST LMT References: http://bugs.python.org/issue762963 https://bugs.launchpad.net/ubuntu/+source/python2.6/+bug/564607 http://stackoverflow.com/questions/4367896/issue-with-timezone-with-time-strftime Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
As per issue 124, some Xen versions (or packaging) don't deal nicely with the colon being part of a disk name. Therefore we add a configure-time option for customising this. Note: setting the separator to interesting values like / is not handled by the code. This being a configure-time option (e.g. to be set by distribution packagers), we assume the person building the code knows what they are doing. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
- Update docstrings to explicitely mention Epoch - Fix timezone bug in FormatTimestampWithTZ, where it would use GMT/UTC when it should use the local timezone - Add unittests for time formatting functions Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
It'll be used for querying locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Also replace “sorted” with “utils.NiceSort” now that it supports a key function. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 20, 2010
-
-
Michael Hanselmann authored
* devel-2.3: Prepare 2.3.1 release Fix disk status verification in LUClusterVerify Conflicts: NEWS: Trivial Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* stable-2.3: Prepare 2.3.1 release Fix disk status verification in LUClusterVerify Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
pylint is not yet included as the code needs some work for that. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 17, 2010
-
-
René Nussbaumer authored
As we can't test this on master anymore (if we flag the node offline we would change master role on master) we use the first non master node we find in the configuration Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
This patch makes sure we cross verify the state the node is in with our view: power off -> Node has to be set offline modify -O no -> Node has to be powered Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* devel-2.3: QA: Run cluster-verify as part of all instance tests QA: Fix typo and add “not” ensure-dirs: Speed up when using big queues Fix gnt-cluster verify with diskless instances Conflicts: lib/cmdlib.py: Trivial qa/ganeti-qa.py: Trivial Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
This patch changes utils.NiceSort to use the built-in “sorted()” and gets rid of the intermediate list. Instead of wrapping the items ourselves, a key function is used. The caller can specify another key function (useful to sort objects by their name, e.g. “utils.NiceSort(instances, key=operator.attrgetter("name"))”. Unittests are provided. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
“gnt-cluster verify” looks at some per-instance information as well, so it should be run for each instance type QA tests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
For secondary node that is offline, we should not consider that the disk shutdown has failed, as it can never succeed under this cluster state and (by virtue of the fact that the secondary node is offline) the disks are already "shutdown". The patch also fixes a tiny typo. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
data ≫ code, eom. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
- Dec 16, 2010
-
-
Michael Hanselmann authored
The “ensure-dirs” script as included in Ganeti 2.3 is very slow when working with big queues requiring a change of permissions on many or all files. $ find /var/lib/ganeti/queue/ | wc -l 52354 Before this change: $ time /usr/local/lib/ganeti/ensure-dirs -f real 16m4.739s While not adressed in this patch, I'd like to record the overall ineffiency of the “ensure-dirs” script, even after this change: $ time /usr/local/lib/ganeti/ensure-dirs -f real 5m57.362s […] $ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.08 5.147090 49 104774 clone 49.92 5.131094 49 104739 execve More changes will be needed. Just for comparision, a small Python snippet changing permissions on all files (“ensure-dirs” changes the owner too): $ time python -c 'import os; from ganeti import utils; [os.chmod(i, 0644) for i in utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]' real 0m0.605s […] Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 15, 2010
-
-
Adeodato Simo authored
`gnt-cluster verify` was failing with KeyError if there was any diskless instance in the cluster. This was because _CollectDiskInfo() was not including these instances in the returned dictionary, but they were expected to be present in LUVerifyCluster.Exec(). With this commit, we ensure that the dictionary returned by _CollectDiskInfo includes entries for diskless instances as well. Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Miguel Di Ciurcio Filho authored
The error contained a typo and is slightly cumbersome. It changes from: - ERROR: node a: not enough memory on to accommodate failovers should peer node b fail to: - ERROR: node a: not enough memory to accomodate instance failovers should node b fail Signed-off-by:
Miguel Di Ciurcio Filho <miguel.filho@gmail.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* devel-2.3: jqueue: Keep jobs in “waitlock” while returning to queue Improve jqueue unittests Update manpages to display version 2.3 Conflicts: man/ganeti-cleaner.sgml: Removed man/ganeti-confd.sgml: Removed man/ganeti-masterd.sgml: Removed man/ganeti-noded.sgml: Removed man/ganeti-os-interface.sgml: Removed man/ganeti-rapi.sgml: Removed man/ganeti-watcher.sgml: Removed man/ganeti.sgml: Removed man/gnt-backup.sgml: Removed man/gnt-cluster.sgml: Removed man/gnt-debug.sgml: Removed man/gnt-instance.sgml: Removed man/gnt-job.sgml: Removed man/gnt-node.sgml: Removed man/gnt-os.sgml: Removed Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Iustin Pop reported that a job's file is updated many times while it waits for locks held by other thread(s). After an investigation it was concluded that the reason was a design decision for job priorities to return jobs to the “queued” status if they couldn't acquire all locks. Changing a jobs' status or priority requires an update to permanent storage. In a high-level view this is what happens: 1. Mark as waitlock 2. Write to disk as permanent storage (jobs left in this state by a crashing master daemon are resumed on restart) 3. Wait for lock (assume lock is held by another thread) 4. Mark as queued 5. Write to disk again 6. Return to workerpool Another option originally discussed was to leave the job in the “waitlock” status. Ignoring priority changes, this is what would happen: 1. If not in waitlock 1.1. Assert state == queued 1.2. Mark as waitlock 1.3. Set start_timestamp 1.4. Write to disk as permanent storage 3. Wait for locks (assume lock is held by another thread) 4. Leave in waitlock 5. Return to workerpool Now let's assume the lock is released by the other thread: […] 3. Wait for locks and get them 4. Assert state == waitlock 5. Set state to running 6. Set exec_timestamp 7. Write to disk As this change reduces the number of writes from two per lock acquire attempt to two per opcode and one per priority increase (as happens after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until the highest priority is reached), here's the patch to implement it. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
- Verify job file updates - Ensure queue lock is released while executing opcode Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
René Nussbaumer authored
Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 14, 2010
-
-
Michael Hanselmann authored
Pylint complained that the “lambda may not be necessary”. Turns out it was right. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-