1. 26 Jan, 2011 7 commits
  2. 20 Jan, 2011 2 commits
  3. 07 Jan, 2011 2 commits
  4. 06 Jan, 2011 2 commits
  5. 29 Dec, 2010 1 commit
  6. 20 Dec, 2010 2 commits
  7. 15 Dec, 2010 2 commits
    • Adeodato Simo's avatar
      Fix gnt-cluster verify with diskless instances · 4f5c2533
      Adeodato Simo authored
      `gnt-cluster verify` was failing with KeyError if there was any
      diskless instance in the cluster. This was because _CollectDiskInfo()
      was not including these instances in the returned dictionary, but they
      were expected to be present in LUVerifyCluster.Exec().
      With this commit, we ensure that the dictionary returned by _CollectDiskInfo
      includes entries for diskless instances as well.
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
    • Michael Hanselmann's avatar
      jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694
      Michael Hanselmann authored
      Iustin Pop reported that a job's file is updated many times while it
      waits for locks held by other thread(s). After an investigation it was
      concluded that the reason was a design decision for job priorities to
      return jobs to the “queued” status if they couldn't acquire all locks.
      Changing a jobs' status or priority requires an update to permanent
      In a high-level view this is what happens:
      1. Mark as waitlock
      2. Write to disk as permanent storage (jobs left in this state by a
         crashing master daemon are resumed on restart)
      3. Wait for lock (assume lock is held by another thread)
      4. Mark as queued
      5. Write to disk again
      6. Return to workerpool
      Another option originally discussed was to leave the job in the
      “waitlock” status. Ignoring priority changes, this is what would happen:
      1. If not in waitlock
      1.1. Assert state == queued
      1.2. Mark as waitlock
      1.3. Set start_timestamp
      1.4. Write to disk as permanent storage
      3. Wait for locks (assume lock is held by another thread)
      4. Leave in waitlock
      5. Return to workerpool
      Now let's assume the lock is released by the other thread:
      3. Wait for locks and get them
      4. Assert state == waitlock
      5. Set state to running
      6. Set exec_timestamp
      7. Write to disk
      As this change reduces the number of writes from two per lock acquire
      attempt to two per opcode and one per priority increase (as happens
      after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
      the highest priority is reached), here's the patch to implement it.
      Unittests are updated.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  8. 09 Dec, 2010 2 commits
    • Iustin Pop's avatar
      Fix disk status verification in LUClusterVerify · d41d07d4
      Iustin Pop authored
      Commit b8d26c6e
       added disk status verification, but it has two
      (different) bugs for not healthy nodes.
      For offline nodes, we don't add at all the disk status to the
      instance/node dict, with the result that the instance is not present in
      the instdisk dict if all of its nodes are offline. This creates a
      KeyError later when we call VerifyInstance with instdisk[instance].
      For online nodes, but which don't return a valid disk status, we simply
      set the status to None for each disk, but the code in _VerifyInstance
      presumes and requires that each status is a valid tuple of length two.
      For both these bugs, we redo the instdisk computations to always include
      valid data, and we enhance the asserts to check for consistency.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
    • Guido Trotter's avatar
      Fix rename for file-backed instances · 3721d2fe
      Guido Trotter authored
      Currently the code wrongly changes the disk logical/physical id
      component representing the path from "$storage_dir/$iname/disk$seq" to
      "$storage_dir/$iname/disk/$seq" (note the additional slash) breaking the
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
  9. 01 Dec, 2010 2 commits
  10. 30 Nov, 2010 1 commit
  11. 18 Nov, 2010 2 commits
    • Iustin Pop's avatar
      Reinstall instance: disallow offline secondaries · 9aacb199
      Iustin Pop authored
      Currently, reinstallation of a DRBD instance with the secondary node offline does:
      node1# gnt-instance reinstall -f instance1
      Waiting for job 139053 for instance1...
      Thu Nov 18 01:36:09 2010  - WARNING: Could not prepare block device disk/0 on node node3 (is_primary=False, pass=1): Node is marked offline
      Thu Nov 18 01:36:09 2010  - WARNING: Could not shutdown block device disk/0 on node node3: Node is marked offline
      Job 139053 for instance1 has failed: Failure: command execution error:
      Disk consistency error
      Since this fails anyway, let's check the secondary nodes, thus
      preventing any modifications to the instance (e.g. OS type change):
      node1# gnt-instance reinstall -f instance1
      Waiting for job 139058 for instance1...
      Job 139058 for instance1 has failed: Failure: prerequisites not met for this operation:
      error type: wrong_state, error details:
      Instance secondary node offline, cannot reinstall: node3
      The patch needs modifications to the _CheckNodeOnline function, in order
      to display meaningful messages ("Can't use offline node" would be very
      confusing for an instance reinstall, since we didn't select a node
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
    • Iustin Pop's avatar
      Fix breakage in OS state modify · e2334900
      Iustin Pop authored
      I was using the feedback_fn function incorrectly (it doesn't
      automatically expand the arguments).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
  12. 17 Nov, 2010 1 commit
  13. 11 Nov, 2010 2 commits
  14. 03 Nov, 2010 2 commits
  15. 01 Nov, 2010 8 commits
  16. 29 Oct, 2010 2 commits