- Jan 06, 2011
-
-
Michael Hanselmann authored
This will allow distributions to install the file as text documentation. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Jan 05, 2011
-
-
Michael Hanselmann authored
This patch formats the upgrade notes currently in the wiki[1] as reST and adds them to the documentation. [1] http://code.google.com/p/ganeti/wiki/UpgradeNotes Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 31, 2010
-
-
Michael Hanselmann authored
s/os-name/os-type/. This was reported in issue 133. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 29, 2010
-
-
Michael Hanselmann authored
Since the recent change to leave jobs in the “waitlock” status (commit 5fd6b694), cancelling a job while it's back in the queue would break. This patch handles these cases and adds a unittest. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 20, 2010
-
-
Michael Hanselmann authored
Point out that jobs already submitted continue to run. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If the socket can't be read in time, it raises “socket.timeout”, for which there is special handling code. Unfortunately the exception block was in the wrong order and “socket.error” caught it before. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* stable-2.3: Prepare 2.3.1 release Fix disk status verification in LUClusterVerify Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 17, 2010
-
-
Michael Hanselmann authored
“gnt-cluster verify” looks at some per-instance information as well, so it should be run for each instance type QA tests. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 16, 2010
-
-
Michael Hanselmann authored
The “ensure-dirs” script as included in Ganeti 2.3 is very slow when working with big queues requiring a change of permissions on many or all files. $ find /var/lib/ganeti/queue/ | wc -l 52354 Before this change: $ time /usr/local/lib/ganeti/ensure-dirs -f real 16m4.739s While not adressed in this patch, I'd like to record the overall ineffiency of the “ensure-dirs” script, even after this change: $ time /usr/local/lib/ganeti/ensure-dirs -f real 5m57.362s […] $ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.08 5.147090 49 104774 clone 49.92 5.131094 49 104739 execve More changes will be needed. Just for comparision, a small Python snippet changing permissions on all files (“ensure-dirs” changes the owner too): $ time python -c 'import os; from ganeti import utils; [os.chmod(i, 0644) for i in utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]' real 0m0.605s […] Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 15, 2010
-
-
Adeodato Simo authored
`gnt-cluster verify` was failing with KeyError if there was any diskless instance in the cluster. This was because _CollectDiskInfo() was not including these instances in the returned dictionary, but they were expected to be present in LUVerifyCluster.Exec(). With this commit, we ensure that the dictionary returned by _CollectDiskInfo includes entries for diskless instances as well. Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Iustin Pop reported that a job's file is updated many times while it waits for locks held by other thread(s). After an investigation it was concluded that the reason was a design decision for job priorities to return jobs to the “queued” status if they couldn't acquire all locks. Changing a jobs' status or priority requires an update to permanent storage. In a high-level view this is what happens: 1. Mark as waitlock 2. Write to disk as permanent storage (jobs left in this state by a crashing master daemon are resumed on restart) 3. Wait for lock (assume lock is held by another thread) 4. Mark as queued 5. Write to disk again 6. Return to workerpool Another option originally discussed was to leave the job in the “waitlock” status. Ignoring priority changes, this is what would happen: 1. If not in waitlock 1.1. Assert state == queued 1.2. Mark as waitlock 1.3. Set start_timestamp 1.4. Write to disk as permanent storage 3. Wait for locks (assume lock is held by another thread) 4. Leave in waitlock 5. Return to workerpool Now let's assume the lock is released by the other thread: […] 3. Wait for locks and get them 4. Assert state == waitlock 5. Set state to running 6. Set exec_timestamp 7. Write to disk As this change reduces the number of writes from two per lock acquire attempt to two per opcode and one per priority increase (as happens after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until the highest priority is reached), here's the patch to implement it. Unittests are updated. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
- Verify job file updates - Ensure queue lock is released while executing opcode Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 14, 2010
-
-
Miguel Di Ciurcio Filho authored
Signed-off-by:
Miguel Di Ciurcio Filho <miguel.filho@gmail.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Dec 09, 2010
-
-
Iustin Pop authored
Commit b8d26c6e added disk status verification, but it has two (different) bugs for not healthy nodes. For offline nodes, we don't add at all the disk status to the instance/node dict, with the result that the instance is not present in the instdisk dict if all of its nodes are offline. This creates a KeyError later when we call VerifyInstance with instdisk[instance]. For online nodes, but which don't return a valid disk status, we simply set the status to None for each disk, but the code in _VerifyInstance presumes and requires that each status is a valid tuple of length two. For both these bugs, we redo the instdisk computations to always include valid data, and we enhance the asserts to check for consistency. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com>
-
Guido Trotter authored
* devel-2.2: Fix rename for file-backed instances Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
* stable-2.2: Fix rename for file-backed instances Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
* stable-2.2: Fix rename for file-backed instances Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Guido Trotter authored
Currently the code wrongly changes the disk logical/physical id component representing the path from "$storage_dir/$iname/disk$seq" to "$storage_dir/$iname/disk/$seq" (note the additional slash) breaking the rename. Signed-off-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 02, 2010
-
-
Michael Hanselmann authored
* stable-2.3: Bump version for 2.3.1~rc1 release Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Dec 01, 2010
-
-
Michael Hanselmann authored
Just being told that a lock doesn't exist can be confusing. One case were this happens is when a job (e.g. instance modify) waits for a job removing the instance (e.g. export with remove). Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This uses an option only available in patched socat versions. More information is available from the INSTALL update included in this patch. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Guido Trotter <ultrotter@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
* stable-2.3: Bump version for 2.3.0 Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 30, 2010
-
-
Michael Hanselmann authored
* devel-2.2: Correct version check for release candidates Fix version check Add script to check version format Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The tilde needs to be escaped and I forgot the space which should be used instead. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Adeodato Simo authored
Signed-off-by:
Adeodato Simo <dato@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 25, 2010
-
-
Michael Hanselmann authored
Don't ask … all I say is distcheck. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- Nov 24, 2010
-
-
Michael Hanselmann authored
Only versions of the format “x.y.z” and “x.y.z~(rc|beta)N” (for N>0) are allowed. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently, the coverage reports include the unittests themselves, and this skewes unfairly the reports, as the coverage for the tests is very high (since they all run). To fix this, we export the ganeti temp dir from run-in-temp-dir, and we use that to exclude the tests directory. The patch also fixes a but related to multiple directories to be omitted (--omit a --omit b is wrong, it needs to be --omit a,b). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Nov 19, 2010
-
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
* devel-2.2: Update NEWS & configure.ac for the 2.2.2 release Fix documentation regarding conversion to drbd Conflicts: NEWS (integrated 2.2 changes) configure.ac (kept our version) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This imports the 2.1.8 NEWS entry and adds the 2.2.2 one, then updates the configure.ac version. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- Nov 18, 2010
-
-
Iustin Pop authored
Currently, reinstallation of a DRBD instance with the secondary node offline does: node1# gnt-instance reinstall -f instance1 Waiting for job 139053 for instance1... Thu Nov 18 01:36:09 2010 - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline Thu Nov 18 01:36:09 2010 - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline Job 139053 for instance1 has failed: Failure: command execution error: Disk consistency error Since this fails anyway, let's check the secondary nodes, thus preventing any modifications to the instance (e.g. OS type change): node1# gnt-instance reinstall -f instance1 Waiting for job 139058 for instance1... Job 139058 for instance1 has failed: Failure: prerequisites not met for this operation: error type: wrong_state, error details: Instance secondary node offline, cannot reinstall: node3 The patch needs modifications to the _CheckNodeOnline function, in order to display meaningful messages ("Can't use offline node" would be very confusing for an instance reinstall, since we didn't select a node manually). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This would have prevented the bug fixed in the previous patch :( Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-