1. 15 Apr, 2010 1 commit
    • Iustin Pop's avatar
      Fix cluster behaviour with disabled file storage · 0e3baaf3
      Iustin Pop authored
      
      
      There are a few issues with disabled file storage:
      - cluster initialization is broken by default, as it uses the 'no'
        setting which is not a valid path
      - some other parts of the code require the file storage dir to be a
        valid path; we workaround by skipping such code paths when it is
        disabled
      
      A side effect is that we abstract the storage type checks into a
      separate function and add validation in RepairNodeStorage (previously a
      luxi client which didn't use cli.py and submitted an invalid type would
      get "storage units of type 'foo' can not be repaired").
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      0e3baaf3
  2. 12 Apr, 2010 11 commits
  3. 08 Apr, 2010 1 commit
  4. 23 Mar, 2010 4 commits
    • Guido Trotter's avatar
      Allow file storage to be grown · 2c42c5df
      Guido Trotter authored
      
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      2c42c5df
    • Guido Trotter's avatar
      Fix burnin error when trying to grow a file volume · 728489a3
      Guido Trotter authored
      
      
      Abstract the growable disk types in a ganeti constants, and only run
      disk grow, from burnin, on them.
      Signed-off-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      728489a3
    • Iustin Pop's avatar
      Some epydoc fixes · 3a488770
      Iustin Pop authored
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      3a488770
    • Iustin Pop's avatar
      A rewrite of LUClusterVerify · 02c521e4
      Iustin Pop authored
      
      
      Per issue 90, current cluster verify is very very brittle. It's one of
      the oldest pieces of code, with only additions without cleanups over the
      last years.
      
      Among its problems:
      
      - data initialization interspersed with verification of RPC results,
        leading to non-initialized data for some branches
      - due to the above, we order strictly some checks and we have the case
        where a bad node time result will skip checking of node volumes
      - many many local variables, with each new check adding a new dict,
        leading to a spaghetti of dicts in the main Exec function
      - monolithic code, both Exec() and _NodeVerify() do a lot of
        independent checks
      
      This patch does an imperfect rewrite, but at least we gain:
      
      - a clear infrastructure for adding more checks (the new NodeImage
        class, with it's clear and documented fields), and removal of most
        per-node dicts from the Exec() function
      - the new NodeImage object should allow better type safety, e.g. by
        allowing pylint to check the actual object attributes rather than
        strings as dict keys
      - a-priori initialization of data fields, eliminating the need to
        introduce dependencies between checks
      - per-result-key status field, allowing elimination of duplicate error
        messages (where we want)
      - split of most independent checks into separate functions, for greater
        clarity
      
      The new code, being new will probably introduce for the short term more
      bugs than it removes. However, it should offer a much better way for
      extending cluster verify in the future.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      02c521e4
  5. 17 Mar, 2010 3 commits
  6. 16 Mar, 2010 1 commit
  7. 15 Mar, 2010 8 commits
    • Michael Hanselmann's avatar
      Rightname confd's HMAC key · 6b7d5878
      Michael Hanselmann authored
      
      
      Currently, the ganeti-confd's HMAC key is called “cluster HMAC key” or
      simply “HMAC key” everywhere. With the implementation of inter-cluster
      instance moves, another HMAC key will be introduced for signing critical
      data. They can not be the same, so this patch clarifies the purpose of the
      “cluster HMAC key” by renaming it. The actual file name is not changed.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      6b7d5878
    • Iustin Pop's avatar
      Implement conversion from drbd to plain · 2f414c48
      Iustin Pop authored
      
      
      This is much simpler than the opposite, with fewer possibilities of
      failures.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      2f414c48
    • Iustin Pop's avatar
      Implement conversion from plain to drbd · e29e9550
      Iustin Pop authored
      
      
      This patch adds a new mode to instance modify, the changing of the disk
      template. For now only plain to drbd conversion is supported, and the
      new secondary node must be specified manually (no iallocator support).
      
      The procedure for conversion works as follows:
      
      - a completely new disk template is created, matching the count, size
        and mode of the instance's current disks
      - we create manually (not via _CreateDisks) all the missing volumes
      - we rename on the primary the LVs to the new name
      - we create manually the DRBD devices
      
      Failures during the creation of volumes will leave orphan volumes.
      Failure during the rename might leave some disks renamed and some not,
      leading to an inconsistent instance.
      
      Once the disks are renamed, we update the instance information and wait
      for resync. Any failures of the DRBD sync must be manually handled (like
      a normal failure, e.g. by running replace-disks, etc.).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      e29e9550
    • Iustin Pop's avatar
      Abstract check that an instance is down · 31624382
      Iustin Pop authored
      
      
      Multiple LUs require that an instance is not running while they operate
      on the instance (reinstall, rename, modify, recreate disks, deactivate
      disks). The code to do this check is duplicate many times, and not very
      consistent (some use call_instance_list, some call_instance_info).
      
      The patch moves this check into a separate function that is then reused.
      The only drawback is that _SafeShutdowInstanceDisks now raises an
      OpPrereqError (even though it is run during Exec()), but this use case
      is fine (there are no other modifications in that Exec).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      31624382
    • Iustin Pop's avatar
      Abstract node free disk space check · 701384a9
      Iustin Pop authored
      
      
      Both create instance and grow disk check the free disk space on nodes
      using the same, duplicate code. Since we'll need this in other places in
      the future, we abstract the check into a new function.
      
      The patch adjusts the error message to be more in-line with the one for
      memory checking, and fixes the exception raised for RPC errors.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      701384a9
    • Iustin Pop's avatar
      Abstract disk template verification · 5d55819e
      Iustin Pop authored
      
      
      This is a simple check, but we'll need it in multiple places.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      5d55819e
    • Iustin Pop's avatar
      LUCreateInstance: implement disk adoption mode · c3589cf8
      Iustin Pop authored
      
      
      This new mode, valid only for the plain template disk, allows creation
      of an instance based on existing logical volumes (preserving data),
      rather than creation of new volumes and OS creation.
      
      The new mode works as follows:
      
      - instead of size, all disks passed in must have an 'adopt' key, which
        signifies the LV name to be used
      - all disks must have this key, or neither should
      - we check the volume existence, and from the result we fill in the
        actual size
      - online (in-use) volumes are not allowed
      - 'stealing' of another's instance volumes is prevented via reservation
        of the LV names
      - during creation, we rename the logical volumes to the standard Ganeti
        format (based on UUID)
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      c3589cf8
    • Iustin Pop's avatar
      LUCreateInstance: Move parameter init earlier · df4272e5
      Iustin Pop authored
      
      
      This way, the parameters are available in CheckArguments too.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      df4272e5
  8. 12 Mar, 2010 3 commits
  9. 11 Mar, 2010 2 commits
  10. 09 Mar, 2010 4 commits
    • Iustin Pop's avatar
      Rework the node modify for mc-demotion · 601908d0
      Iustin Pop authored
      
      
      The current code in LUSetNodeParms regarding the demotion from master
      candidate role is complicated and duplicates the code in ConfigWriter,
      where such decisions should be made. Furthermore, we still cannot demote
      nodes (not even with force), if other regular nodes exist.
      
      This patch adds a new opcode attribute ‘auto_promote’, and changes the
      decision tree as follows:
      
      - if the node will be set to offline or drained or explicitly demoted
        from master candidate, and this parameter is set, then we lock all
        nodes in ExpandNames()
      - later, in CheckPrereq(), if the node is
        indeed a master candidate, and the future state (as computed via
        GetMasterCandidateStats with the current node in the exception list)
        has fewer nodes than it should, and we didn't lock all nodes, we exit
        with an exception
      - in Exec, if we locked all nodes, we do a AdjustCandidatePool() run, to
        ensure nodes are locked as needed (we do it before updating the node
        to remove a warning, and prevent the situation that if the LU fails
        between these, we're not left with an inconsistent state)
      
      Note that in Exec we run the AdjustCP irrespective of any node state
      change (just based on lock status), so we might simplify the CheckPrereq
      even more by not checking the future state, basically requiring
      auto_promote/lock_all for master candidates, since the case where we
      have more than needed master candidates is rarer; OTOH, this would prevent
      manual promotion ahead of time of another node, which is why I didn't
      choose this way.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      601908d0
    • Iustin Pop's avatar
      Fix typo that makes cluster verify to ignore hooks · 6d7b472a
      Iustin Pop authored
      
      
      The return from LUVerifyCluster should be True (or equivalent) for pass,
      and False (or equivalent) for fail. The HooksCallBack function uses '1'
      (= True) when a hook fails, which is exactly the opposite of what we
      want - it will make failed hooks to reset the result to success,
      overriding actual failures in cluster verify.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      6d7b472a
    • Iustin Pop's avatar
      Fix redistribute config and offline nodes · 6819dc49
      Iustin Pop authored
      
      
      We need to manually filter out offline nodes before using
      rpc.call_upload_file and rpc.call_write_ssconf_files, since these method
      are static (they work without a ConfigWriter instance) and thus do not
      know which nodes are offline and which are not).
      
      Note that we add a new ConfigWriter._UnlockedGetOnlineNodeList() method
      rather than hardcoding the filtering of online nodes in _WriteConfig.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      6819dc49
    • René Nussbaumer's avatar
      Add support for per-os-hypervisor parameters · 17463d22
      René Nussbaumer authored
      
      
      This patch implements all modifications to support per-os-hypervisor
      parameters in the framework.
      Signed-off-by: default avatarRené Nussbaumer <rn@google.com>
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      17463d22
  11. 08 Mar, 2010 2 commits
    • Iustin Pop's avatar
      Validate the hostnames at creation time · 44caf5a8
      Iustin Pop authored
      
      
      This patch adds validation of new names used, i.e. at cluster init time,
      node add time, and instance creation.
      
      For instances, especially when using «--no-name-check» (which skips DNS
      checks), we should validate the give name, and also normalize it
      (otherwise, we could have two instances named inst1 and Inst1).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      44caf5a8
    • Iustin Pop's avatar
      Implement disabling of file-based storage · cb7c0198
      Iustin Pop authored
      
      
      Rationale: the file-based storage backend can add/remove files under a
      certain directory. However, the master node is also controlling the
      setting of the file-based root directory, so basically it means we can't
      prevent arbitrary modifications by the master of the node's filesystem.
      
      In order to mitigate this for setups where the file-based storage is not
      used, we introduce a new setting at ./configure time, that controls the
      enable/disable of file-based storage. Since this is not modifiable by
      the master (over RPC), it is now possible in this case to prevent
      unintended modifications of the node's filesystem from the master.
      
      The new setting is used in bdev.py to not expose the file-based storage
      at all, and in cmdlib.py to prevent attempts at creation of such
      instances.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      cb7c0198