- Dec 16, 2010
-
-
Michael Hanselmann authored
The “ensure-dirs” script as included in Ganeti 2.3 is very slow when working with big queues requiring a change of permissions on many or all files. $ find /var/lib/ganeti/queue/ | wc -l 52354 Before this change: $ time /usr/local/lib/ganeti/ensure-dirs -f real 16m4.739s While not adressed in this patch, I'd like to record the overall ineffiency of the “ensure-dirs” script, even after this change: $ time /usr/local/lib/ganeti/ensure-dirs -f real 5m57.362s […] $ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.08 5.147090 49 104774 clone 49.92 5.131094 49 104739 execve More changes will be needed. Just for comparision, a small Python snippet changing permissions on all files (“ensure-dirs” changes the owner too): $ time python -c 'import os; from ganeti import utils; [os.chmod(i, 0644) for i in utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]' real 0m0.605s […] Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 15, 2010
-
-
Adeodato Simo authored
`gnt-cluster verify` was failing with KeyError if there was any diskless instance in the cluster. This was because _CollectDiskInfo() was not including these instances in the returned dictionary, but they were expected to be present in LUVerifyCluster.Exec(). With this commit, we ensure that the dictionary returned by _CollectDiskInfo includes entries for diskless instances as well. Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Iustin Pop reported that a job's file is updated many times while it waits for locks held by other thread(s). After an investigation it was concluded that the reason was a design decision for job priorities to return jobs to the “queued” status if they couldn't acquire all locks. Changing a jobs' status or priority requires an update to permanent storage. In a high-level view this is what happens: 1. Mark as waitlock 2. Write to disk as permanent storage (jobs left in this state by a crashing master daemon are resumed on restart) 3. Wait for lock (assume lock is held by another thread) 4. Mark as queued 5. Write to disk again 6. Return to workerpool Another option originally discussed was to leave the job in the “waitlock” status. Ignoring priority changes, this is what would happen: 1. If not in waitlock 1.1. Assert state == queued 1.2. Mark as waitlock 1.3. Set start_timestamp 1.4. Write to disk as permanent storage 3. Wait for locks (assume lock is held by another thread) 4. Leave in waitlock 5. Return to workerpool Now let's assume the lock is released by the other thread: […] 3. Wait for locks and get them 4. Assert state == waitlock 5. Set state to running 6. Set exec_timestamp 7. Write to disk As this change reduces the number of writes from two per lock acquire attempt to two per opcode and one per priority increase (as happens after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until the highest priority is reached), here's the patch to implement it. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
- Verify job file updates - Ensure queue lock is released while executing opcode Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 14, 2010
-
-
Miguel Di Ciurcio Filho authored
Signed-off-by:
Miguel Di Ciurcio Filho <miguel.filho@gmail.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 09, 2010
-
-
Guido Trotter authored
* devel-2.2: Fix rename for file-backed instances Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
* stable-2.2: Fix rename for file-backed instances Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently the code wrongly changes the disk logical/physical id component representing the path from "$storage_dir/$iname/disk$seq" to "$storage_dir/$iname/disk/$seq" (note the additional slash) breaking the rename. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 02, 2010
-
-
Michael Hanselmann authored
* stable-2.3: Bump version for 2.3.1~rc1 release Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 01, 2010
-
-
Michael Hanselmann authored
Just being told that a lock doesn't exist can be confusing. One case were this happens is when a job (e.g. instance modify) waits for a job removing the instance (e.g. export with remove). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This uses an option only available in patched socat versions. More information is available from the INSTALL update included in this patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* stable-2.3: Bump version for 2.3.0 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 30, 2010
-
-
Michael Hanselmann authored
* devel-2.2: Correct version check for release candidates Fix version check Add script to check version format Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The tilde needs to be escaped and I forgot the space which should be used instead. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 25, 2010
-
-
Michael Hanselmann authored
Don't ask … all I say is distcheck. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 24, 2010
-
-
Michael Hanselmann authored
Only versions of the format “x.y.z” and “x.y.z~(rc|beta)N” (for N>0) are allowed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, the coverage reports include the unittests themselves, and this skewes unfairly the reports, as the coverage for the tests is very high (since they all run). To fix this, we export the ganeti temp dir from run-in-temp-dir, and we use that to exclude the tests directory. The patch also fixes a but related to multiple directories to be omitted (--omit a --omit b is wrong, it needs to be --omit a,b). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 19, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
* devel-2.2: Update NEWS & configure.ac for the 2.2.2 release Fix documentation regarding conversion to drbd Conflicts: NEWS (integrated 2.2 changes) configure.ac (kept our version) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This imports the 2.1.8 NEWS entry and adds the 2.2.2 one, then updates the configure.ac version. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Nov 18, 2010
-
-
Iustin Pop authored
Currently, reinstallation of a DRBD instance with the secondary node offline does: node1# gnt-instance reinstall -f instance1 Waiting for job 139053 for instance1... Thu Nov 18 01:36:09 2010 - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline Thu Nov 18 01:36:09 2010 - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline Job 139053 for instance1 has failed: Failure: command execution error: Disk consistency error Since this fails anyway, let's check the secondary nodes, thus preventing any modifications to the instance (e.g. OS type change): node1# gnt-instance reinstall -f instance1 Waiting for job 139058 for instance1... Job 139058 for instance1 has failed: Failure: prerequisites not met for this operation: error type: wrong_state, error details: Instance secondary node offline, cannot reinstall: node3 The patch needs modifications to the _CheckNodeOnline function, in order to display meaningful messages ("Can't use offline node" would be very confusing for an instance reinstall, since we didn't select a node manually). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This would have prevented the bug fixed in the previous patch :( Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
I was using the feedback_fn function incorrectly (it doesn't automatically expand the arguments). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Nov 17, 2010
-
-
Iustin Pop authored
* devel-2.2: QA: add tests for gnt-cluster modify -B LUSetClusterParms: fix validation of beparams Conflicts: lib/cmdlib.py (reverted & applied manually the change) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Since the contents of the dict is validated via the ForceDictType, we can simply require that it is a dict here. The previous check was wrong, as it was copied from the HV checks (which also doesn't verify the leaf dict type). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 11, 2010
-
-
Iustin Pop authored
And fix an error message. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
David Knowles authored
Note: It appears this has been around since the initial checkin of TemporaryReservationManager. I have no idea what this could break, so someone else may want to test this more thoroughly. Signed-off-by:
David Knowles <dknowles@google.com> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 09, 2010
-
-
Michael Hanselmann authored
* devel-2.2: devel/release: Use release-specific Makefile targets Makefile: Add new dist target for releases Makefile: Stricter checks for release distchecks Conflicts: Makefile.am: Trivial Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
A new script, autotools/check-tar, is used to check the resulting .tar.gz file for unwanted contents like wrong file owners or permissions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 08, 2010
-
-
Apollon Oikonomopoulos authored
man/ganeti-os-interace.sgml lacked complete information for the NIC-related environment variables. Added a reference to NIC_%N_LINK and NIC_%N_MODE and clarified the reference to NIC_%N_BRIDGE. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 04, 2010
-
-
Michael Hanselmann authored
Including empty files can cause unnecessary warnings for packagers. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Michael Hanselmann authored
After commit e7e23e73 the build would fail in distcheck on systems with Automake 1.10. An investigation identified Automake bug #533[1] as the cause. Applying the changes in Automake commit 3a12ed5e[2] to the generated Makefile.in file made distcheck work again. The underlying problem is that in our case both doc/html and doc/html/.dir were included in the distributed files. When distcheck copied the former from the source to the staging directory, it was marked as read-only (distcheck makes the whole source read-only). It then tried to copy doc/html/.dir from the build directory, which failed. Automake 1.11 and newer avoid this problem by adjusting the permissions. Since depending on Automake 1.11 or above is not an option at this time, a work-around was found by not using a “.dir” file in doc/html, but using “index.html” as a flag for creating the directory. [1] http://sourceware.org/cgi-bin/gnatsweb.pl?cmd=view&database=automake&pr=533 [2] http://git.savannah.gnu.org/gitweb/?p=automake.git;a=commit;h=3a12ed5e97dc193a38dd14e031658cbd329b50ca Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-