1. 12 Mar, 2010 3 commits
  2. 11 Mar, 2010 2 commits
  3. 09 Mar, 2010 4 commits
    • Iustin Pop's avatar
      Rework the node modify for mc-demotion · 601908d0
      Iustin Pop authored
      
      
      The current code in LUSetNodeParms regarding the demotion from master
      candidate role is complicated and duplicates the code in ConfigWriter,
      where such decisions should be made. Furthermore, we still cannot demote
      nodes (not even with force), if other regular nodes exist.
      
      This patch adds a new opcode attribute ‘auto_promote’, and changes the
      decision tree as follows:
      
      - if the node will be set to offline or drained or explicitly demoted
        from master candidate, and this parameter is set, then we lock all
        nodes in ExpandNames()
      - later, in CheckPrereq(), if the node is
        indeed a master candidate, and the future state (as computed via
        GetMasterCandidateStats with the current node in the exception list)
        has fewer nodes than it should, and we didn't lock all nodes, we exit
        with an exception
      - in Exec, if we locked all nodes, we do a AdjustCandidatePool() run, to
        ensure nodes are locked as needed (we do it before updating the node
        to remove a warning, and prevent the situation that if the LU fails
        between these, we're not left with an inconsistent state)
      
      Note that in Exec we run the AdjustCP irrespective of any node state
      change (just based on lock status), so we might simplify the CheckPrereq
      even more by not checking the future state, basically requiring
      auto_promote/lock_all for master candidates, since the case where we
      have more than needed master candidates is rarer; OTOH, this would prevent
      manual promotion ahead of time of another node, which is why I didn't
      choose this way.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      601908d0
    • Iustin Pop's avatar
      Fix typo that makes cluster verify to ignore hooks · 6d7b472a
      Iustin Pop authored
      
      
      The return from LUVerifyCluster should be True (or equivalent) for pass,
      and False (or equivalent) for fail. The HooksCallBack function uses '1'
      (= True) when a hook fails, which is exactly the opposite of what we
      want - it will make failed hooks to reset the result to success,
      overriding actual failures in cluster verify.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      6d7b472a
    • Iustin Pop's avatar
      Fix redistribute config and offline nodes · 6819dc49
      Iustin Pop authored
      
      
      We need to manually filter out offline nodes before using
      rpc.call_upload_file and rpc.call_write_ssconf_files, since these method
      are static (they work without a ConfigWriter instance) and thus do not
      know which nodes are offline and which are not).
      
      Note that we add a new ConfigWriter._UnlockedGetOnlineNodeList() method
      rather than hardcoding the filtering of online nodes in _WriteConfig.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      6819dc49
    • René Nussbaumer's avatar
      Add support for per-os-hypervisor parameters · 17463d22
      René Nussbaumer authored
      
      
      This patch implements all modifications to support per-os-hypervisor
      parameters in the framework.
      Signed-off-by: default avatarRené Nussbaumer <rn@google.com>
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      17463d22
  4. 08 Mar, 2010 3 commits
    • Iustin Pop's avatar
      Validate the hostnames at creation time · 44caf5a8
      Iustin Pop authored
      
      
      This patch adds validation of new names used, i.e. at cluster init time,
      node add time, and instance creation.
      
      For instances, especially when using «--no-name-check» (which skips DNS
      checks), we should validate the give name, and also normalize it
      (otherwise, we could have two instances named inst1 and Inst1).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      44caf5a8
    • Iustin Pop's avatar
      Implement disabling of file-based storage · cb7c0198
      Iustin Pop authored
      
      
      Rationale: the file-based storage backend can add/remove files under a
      certain directory. However, the master node is also controlling the
      setting of the file-based root directory, so basically it means we can't
      prevent arbitrary modifications by the master of the node's filesystem.
      
      In order to mitigate this for setups where the file-based storage is not
      used, we introduce a new setting at ./configure time, that controls the
      enable/disable of file-based storage. Since this is not modifiable by
      the master (over RPC), it is now possible in this case to prevent
      unintended modifications of the node's filesystem from the master.
      
      The new setting is used in bdev.py to not expose the file-based storage
      at all, and in cmdlib.py to prevent attempts at creation of such
      instances.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      cb7c0198
    • Iustin Pop's avatar
      Switch from os.path.join to utils.PathJoin · c4feafe8
      Iustin Pop authored
      
      
      This passes a full burnin with lots of instances, and should be safe as
      we mostly to join a known root (various constants) to a run-time
      variable.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      c4feafe8
  5. 26 Feb, 2010 1 commit
  6. 25 Feb, 2010 1 commit
  7. 22 Feb, 2010 6 commits
  8. 17 Feb, 2010 3 commits
  9. 15 Feb, 2010 2 commits
    • Iustin Pop's avatar
      Release all node locks during disk replace · d5cd389c
      Iustin Pop authored
      This patch extends commit 7ea7bcf6
      
       by releasing all node locks in disk
      replace for the early release mode. The rationale behind this is:
      
      - LUCreateInstance already releases all node locks while waiting for
        disk synchronization, and does an instance startup later
      - WaitForSync only runs (for disk template 'drbd') 'lvs' and read
        /proc/drbd on the primary node, which should be (modulo bugs in LVM)
        safe for parallel run
      
      In any case, the worst I could foresee is a node having N lvs commands
      run in parallel on it, while being a primary for disk storage. Based on
      create instance doing this safely, and the fact that burnin with more
      than two instances per node is safe, I think this can be applied.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      d5cd389c
    • Iustin Pop's avatar
      Auto-enable early release for offline old nodes · 9af0fa6a
      Iustin Pop authored
      
      
      In case the old node is offline, we won't be able to talk to it to
      remove the storage, and in most cases the node is powered
      off/unreachable.
      
      In this case, it makes no sense to delay the storage release, so we
      enable automatically early_release mode, gaining parallelism during node
      evacuation.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      9af0fa6a
  10. 11 Feb, 2010 4 commits
  11. 09 Feb, 2010 1 commit
    • Iustin Pop's avatar
      Add an early release lock/storage for disk replace · 7ea7bcf6
      Iustin Pop authored
      
      
      This patch adds an early_release parameter in the OpReplaceDisks and
      OpEvacuateNode opcodes, allowing earlier release of storage and more
      importantly of internal Ganeti locks.
      
      The behaviour of the early release is that any locks and storage on all
      secondary nodes are released early. This is valid for change secondary
      (where we remove the storage on the old secondary, and release the locks
      on the old and new secondary) and replace on secondary (where we remove
      the old storage and release the lock on the secondary node.
      
      Using this, on a three node setup:
      
      - instance1 on nodes A:B
      - instance2 on nodes C:B
      
      It is possible to run in parallel a replace-disks -s (on secondary) for
      instances 1 and 2.
      
      Replace on primary will remove the storage, but not the locks, as we use
      the primary node later in the LU to check consistency.
      
      It is debatable whether to also remove the locks on the primary node,
      and thus making replace-disks keep zero locks during the sync. While
      this would allow greatly enhanced parallelism, let's first see how
      removal of secondary locks works.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      7ea7bcf6
  12. 08 Feb, 2010 1 commit
  13. 03 Feb, 2010 3 commits
  14. 28 Jan, 2010 1 commit
  15. 25 Jan, 2010 2 commits
  16. 11 Jan, 2010 1 commit
  17. 05 Jan, 2010 1 commit
  18. 04 Jan, 2010 1 commit