Skip to content
Snippets Groups Projects
  1. Jan 26, 2011
    • Michael Hanselmann's avatar
      Fix bug in “gnt-node list-storage” · 5ae7cd11
      Michael Hanselmann authored
      
      LVM PV storage units would always show as allocatable, even when they
      weren't. For some reason I have not been able to determine, the function
      parsing the attributes (“_GetAllocatable”) was not even called and the
      list opcode simply returned the attribute string as the value (e.g.
      “a-”).  Removing “@staticmethod” did the trick and then I just moved it
      to module level.
      
      A QA test is included.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      5ae7cd11
  2. Jan 20, 2011
  3. Jan 14, 2011
  4. Jan 12, 2011
  5. Jan 07, 2011
  6. Jan 06, 2011
  7. Jan 05, 2011
  8. Dec 31, 2010
  9. Dec 29, 2010
  10. Dec 20, 2010
  11. Dec 17, 2010
  12. Dec 16, 2010
    • Michael Hanselmann's avatar
      ensure-dirs: Speed up when using big queues · 196d70fa
      Michael Hanselmann authored
      
      The “ensure-dirs” script as included in Ganeti 2.3 is very slow when
      working with big queues requiring a change of permissions on many or all
      files.
      
      $ find /var/lib/ganeti/queue/ | wc -l
      52354
      
      Before this change:
      $ time /usr/local/lib/ganeti/ensure-dirs -f
      real    16m4.739s
      
      While not adressed in this patch, I'd like to record the overall
      ineffiency of the “ensure-dirs” script, even after this change:
      
      $ time /usr/local/lib/ganeti/ensure-dirs -f
      real    5m57.362s
      […]
      $ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f
      % time     seconds  usecs/call     calls    errors syscall
      ------ ----------- ----------- --------- --------- ----------------
       50.08    5.147090          49    104774           clone
       49.92    5.131094          49    104739           execve
      
      More changes will be needed. Just for comparision, a small Python
      snippet changing permissions on all files (“ensure-dirs” changes the
      owner too):
      
      $ time python -c 'import os; from ganeti import utils;
      [os.chmod(i, 0644) for i in
      utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]'
      real    0m0.605s
      […]
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      196d70fa
  13. Dec 15, 2010
    • Adeodato Simo's avatar
      Fix gnt-cluster verify with diskless instances · 4f5c2533
      Adeodato Simo authored
      
      `gnt-cluster verify` was failing with KeyError if there was any
      diskless instance in the cluster. This was because _CollectDiskInfo()
      was not including these instances in the returned dictionary, but they
      were expected to be present in LUVerifyCluster.Exec().
      
      With this commit, we ensure that the dictionary returned by _CollectDiskInfo
      includes entries for diskless instances as well.
      
      Signed-off-by: default avatarAdeodato Simo <dato@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      4f5c2533
    • Michael Hanselmann's avatar
      jqueue: Keep jobs in “waitlock” while returning to queue · 5fd6b694
      Michael Hanselmann authored
      
      Iustin Pop reported that a job's file is updated many times while it
      waits for locks held by other thread(s). After an investigation it was
      concluded that the reason was a design decision for job priorities to
      return jobs to the “queued” status if they couldn't acquire all locks.
      Changing a jobs' status or priority requires an update to permanent
      storage.
      
      In a high-level view this is what happens:
      1. Mark as waitlock
      2. Write to disk as permanent storage (jobs left in this state by a
         crashing master daemon are resumed on restart)
      3. Wait for lock (assume lock is held by another thread)
      4. Mark as queued
      5. Write to disk again
      6. Return to workerpool
      
      Another option originally discussed was to leave the job in the
      “waitlock” status. Ignoring priority changes, this is what would happen:
      1. If not in waitlock
      1.1. Assert state == queued
      1.2. Mark as waitlock
      1.3. Set start_timestamp
      1.4. Write to disk as permanent storage
      3. Wait for locks (assume lock is held by another thread)
      4. Leave in waitlock
      5. Return to workerpool
      
      Now let's assume the lock is released by the other thread:
      […]
      3. Wait for locks and get them
      4. Assert state == waitlock
      5. Set state to running
      6. Set exec_timestamp
      7. Write to disk
      
      As this change reduces the number of writes from two per lock acquire
      attempt to two per opcode and one per priority increase (as happens
      after 24 acquire attempts (see mcpu._CalculateLockAttemptTimeouts) until
      the highest priority is reached), here's the patch to implement it.
      Unittests are updated.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      5fd6b694
    • Michael Hanselmann's avatar
      Improve jqueue unittests · ebb2a2a3
      Michael Hanselmann authored
      
      - Verify job file updates
      - Ensure queue lock is released while executing opcode
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      ebb2a2a3
  14. Dec 14, 2010
  15. Dec 09, 2010
  16. Dec 02, 2010
  17. Dec 01, 2010
  18. Nov 30, 2010
Loading