1. 03 Dec, 2007 1 commit
  2. 14 Nov, 2007 1 commit
    • Guido Trotter's avatar
      When an assembly error occurs log it too · 20a0c9ef
      Guido Trotter authored
      Right now an assembly error produces an exception but not a log message. This
      is bad because the exception suggests looking at the log, but the log itself
      has a lot of errors which are not really a problem and only some which really
      is. In order to make it clear where in the log the problem occurred we log a
      message too, before raising the exception.
      
      Reviewed-by: iustinp
      20a0c9ef
  3. 12 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Fix a wrong comparison in _RecursiveAssembleBD · 7803d4d3
      Iustin Pop authored
      We want to prevent sending too many 'None' children to a device.
      However, the test as it is today is wrong, as we want to test the
      situation after adding a new child, and not before. This patch fixes
      this by testing greater-or-equal instead of just greater.
      
      Reviewed-by: imsnah
      7803d4d3
  4. 09 Nov, 2007 1 commit
  5. 07 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Enhance secondary node replace for drbd8 · 0834c866
      Iustin Pop authored
      This (big) patch does two things:
        - add "local disk status" to the block device checks
          (BlockDevice.GetSyncStatus and the rpc calls that call this
          function, and therefore cmdlib._CheckDiskConsistency)
        - improve the drbd8 secondary replace operation using the above
          functionality
      
      The "local disk status" adds a new variable to the result of
      GetSyncStatus that shows the degradation of the local storage of the
      device. Of course, not all device support this - for now, we only modify
      LogicalVolumes and DRBD8 to return degraded in some cases, other devices
      always return non-degraded. This variable should be a subset of
      is_degraded - whenever this variable is true, the is_degraded should
      also be true.
      
      The drbd8 secondary replace uses this variable as we don't care if the
      primary drbd device is network-degraded, only if it has good local disk
      data (ldisk is False).
      
      The patch also increases the protocol version (due to rpc changes).
      
      Reviewed-by: imsnah
      0834c866
  6. 06 Nov, 2007 2 commits
    • Iustin Pop's avatar
      Allow DRBD8 operation without backing storage · fc1dc9d7
      Iustin Pop authored
      This patch adds the following functionality:
        - DRBD8 devices can assemble without local storage (done by allowing
          None in the list of children, and making DRBD8 to ignore all
          children if any is None)
        - DRBD8 devices can attach (i.e. identify a device) which is not
          connected to backing storage but to the correct network ports; this
          is a rare case in normal operation (it's what would happen if one
          manually detaches the local disk, and the backing LV still exists)
      
      Reviewed-by: imsnah
      fc1dc9d7
    • Iustin Pop's avatar
      Change the way remove children is called in bdev · e739bd57
      Iustin Pop authored
      For some cases, we don't have to have access to the children of a device
      in order to remove them (e.g. md over lvs, or drbd over lvs). In order
      to ease the removal process, skip over finding the child if it provides
      a static dev path.
      
      This is needed in order to support removal of children when the
      underlying storage has gone away.
      
      Reviewed-by: imsnah
      e739bd57
  7. 05 Nov, 2007 3 commits
    • Iustin Pop's avatar
      Fix a unhandled error case in device creation · cf5a8306
      Iustin Pop authored
      The block device creation process is the following:
        - device create
        - device assembly (on primary or depending on dev_type, on secondary
          too)
        - set sync speed
        - return
      
      The problem is that device assembly after creation was not checked for
      errors, and as this is a very unusual case, we did not have problems
      with it (or we didn't detect them). The recent DevCacheManager however
      tripped on this case (because the dev_path of the device is None if the
      assembly fails) and the creation aborted with an unclear error message.
      
      The patch adds a check for the assembly success and aborts the creation
      of the device in this case - the error is quite clear in the instance
      add, for example. The patch also changes DevCacheManager to log the
      cases when dev_path is None but not raise an error (keeping consistent
      with the goal that the cache manager should be transparent to the code).
      
      For the record, this error case was detected with a mismatch between
      drbd kernel module and utilities.
      
      Reviewed-by: imsnah
      cf5a8306
    • Iustin Pop's avatar
      Miscellaneous style fixes · 65fe4693
      Iustin Pop authored
      This patch fixes some minor pylint warnings (unused variables, wrong
      indentation, etc.) and a real bug in the recovery for drbd8 rename
      procedure.
      
      Reviewed-by: imsnah
      65fe4693
    • Guido Trotter's avatar
      Make DiagnoseOS use the modified OS objects · 8fa42c7c
      Guido Trotter authored
      Modify backend.py so that DiagnoseOS only returns OS objects rather than
      InvalidOS errors, and make sure gnt-os understands the new objects. Also delete
      the deprecated helper functions from gnt-os.
      
      Reviewed-By: iustinp
      
      8fa42c7c
  8. 04 Nov, 2007 1 commit
  9. 02 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Implement device to instance mapping cache · 3f78eef2
      Iustin Pop authored
      Currently, troubleshooting DRBD problems involves a manual process of going
      backwards from the DRBD device to the instance that owns it.
      
      This patch adds a weak (i.e. not guaranteed to be correct or up-to-date)
      cache of device to instance. The cache should be, in normal operation,
      having correct information as the only time when devices change paths
      are when they are started/stopped, and the code in backend.py adds cache
      updates to exactly these operations.
      
      The only drawback of this implementation is that we don't fully update
      the cache on renames of devices (we clean the old entries but we don't
      add new ones). Since the rename changes the path only for LVs (and not
      drbd and md), this is less of a problem as the target of this code is
      debugging DRBD and MD issues.
      
      The patch writes files named bdev_drbd<N> (or bdev_md<N>,
      bdev_xenvg_...) in /var/run/ganeti (more exactly, LOCALSTATEDIR/ganeti).
      The files start with 'bdev_' and continue with the path of the device
      under /dev/ (this prefix stripped), and contain the following values,
      space separated:
        - instance name
        - primary or secondary (depending on how the device is on the primary
          or secondary node)
        - instance visible name: sda or sdb or not_visible, the latter case
          when the device is not the top-level device (i.e. remote_raid1
          templates will have sd[ab] for the md, but not_visible for drbd and
          logical volumes)
      
      The cache is designed to not raise any errors, if there is an I/O error
      it will only be logged in the node daemon log file. This is in order to
      reduce the possible impact of the cache on the block device activation
      and shutdown code.
      
      Reviewed-by: imsnah
      3f78eef2
  10. 01 Nov, 2007 1 commit
  11. 29 Oct, 2007 3 commits
    • Iustin Pop's avatar
      Fix a non-clear error message · 233d06c5
      Iustin Pop authored
      Reviewed-by: imsnah
      233d06c5
    • Iustin Pop's avatar
      Implement replace-disks for drbd8 devices · a9e0c397
      Iustin Pop authored
      This patch adds three modes of disk replacement for drbd8:
        - replace the disk on the primary node
        - replace the disk on the secondary node
        - replace the secondary node
      
      It also adds some debugging code to backend.py and increments the
      protocol version for the recent changes of the rpc layer.
      
      Reviewed-by: imsnah
      a9e0c397
    • Iustin Pop's avatar
      Implement block device renaming · f3e513ad
      Iustin Pop authored
      This patch add code for renaming a device; more precisely, for changing
      the unique_id of the device. This means:
        - logical volumes, rename the volume
        - drbd8, change the remote peer
      
      This is needed for the being able to replace disks for drbd8.
      
      Reviewed-by: imsnah
      f3e513ad
  12. 25 Oct, 2007 1 commit
    • Iustin Pop's avatar
      Modify two mirror-device related rpc calls · 153d9724
      Iustin Pop authored
      The two calls mirror_addchild and mirror_removechild take only one child
      for addition/removal. While this is enough for our md usage, for local
      disk replacement in drbd8, we need to be able to specify both the data
      and metadata device. This patch modifies these two rpc calls (and their
      backend implementation and their usage in cmdlib) to take a list of
      children to add/remove.
      
      Reviewed-by: imsnah
      153d9724
  13. 19 Oct, 2007 1 commit
    • Iustin Pop's avatar
      Abstract more strings values into constants · fe96220b
      Iustin Pop authored
      Currently, the disk types are defined using constants in the code.
      Convert those into constants so that we can easily find them and check
      their usage.
      
      Note that we don't rename the values of the constants as they are used
      in the configuration file, and as such it's best to leave them as they
      are.
      
      Reviewed-by: imsnah
      fe96220b
  14. 17 Oct, 2007 1 commit
    • Alexander Schreiber's avatar
      Patch series for reboot feature, part 1 · 007a2f3e
      Alexander Schreiber authored
      This patch series implements the reboot command for gnt-instance. It
      supports three types of reboot: soft (hypervisor reboot), hard (instance
      config rebuild and reboot) and full (full instance shutdown and startup
      again).
      
      This patch contains the backend and rpc part of the patch.
      
      
      Reviewed-by: iustinp
      
      007a2f3e
  15. 16 Oct, 2007 1 commit
    • Iustin Pop's avatar
      Replace more ssh paths with proper constants · 70d9e3d8
      Iustin Pop authored
      The node's ssh keys filenames are now provided as constants; this should
      allow easier customization.
      
      Also, the user's ssh key computing has been abstracted into ssh.py
      
      Reviewed-by: imsnah
      70d9e3d8
  16. 15 Oct, 2007 1 commit
  17. 12 Oct, 2007 2 commits
    • Iustin Pop's avatar
      Remove some hardcoded names/paths from backend.py · 7900ed01
      Iustin Pop authored
      This patch does the following:
        - add constants.GANETI_RUNAS = "root", which is used to compute
          the homedir (and thus the .ssh directory) instead of hardcoding
          "/root/.ssh" in backend.AddNode and backend.LeaveCluster
        - add constants.SSH_CONFIG_DIR (currently hardcoded to /etc/ssh) that
          is used in backend instead of hardcoding it (preparation for
          selecting that at ./configure time)
        - some more internal cleanup in backend.AddNode
      
      Reviewed-by: imsnah
      7900ed01
    • Iustin Pop's avatar
      Do not walk the whole DATA_DIR on node leave · 71eca7c3
      Iustin Pop authored
      Since we remove only files from DATA_DIR and not from subdirectories,
      let's not walk the entire tree, a simple listdir suffices. Also switch
      to utils.RemoveFile from simple os.unlink.
      
      Reviewed-by: imsnah
      71eca7c3
  18. 10 Oct, 2007 1 commit
    • Iustin Pop's avatar
      Remove the shebang from modules · 2f31098c
      Iustin Pop authored
      Since modules are not directly executables, remove the shebang from
      them. This helps with lintian warnings.
      
      Also make the autogenerated _autoconf.py contain two comment lines at
      the beginning, like the other modules.
      
      Reviewed-by: ultrotter
      2f31098c
  19. 08 Oct, 2007 2 commits
  20. 04 Oct, 2007 2 commits
    • Guido Trotter's avatar
      Remove redundant check. · 0ee60a28
      Guido Trotter authored
      This isdir() check leads to a broken error message. Even fixing it creates some
      cases in which the error message is nebulous and unclear while removing it
      makes this situation be dealt with a lot better by the _OSOndiskVersion checks.
      
      
      Reviewed-by: iustinp
      
      0ee60a28
    • Guido Trotter's avatar
      Ship (and display) path for InvalidOS errors too. · 305a7297
      Guido Trotter authored
      - Document the expected change to errors.InvalidOS
      - Always pass the additional argument
      - Modify DiagnoseOS output to show the path
      
      
      Reviewed-by: iustinp, imsnah
      
      305a7297
  21. 03 Oct, 2007 2 commits
  22. 28 Sep, 2007 1 commit
  23. 25 Sep, 2007 2 commits
  24. 17 Sep, 2007 3 commits
    • Iustin Pop's avatar
      A few minor fixes in backend.py · 9716fdce
      Iustin Pop authored
      This uses the recently-added Instance.FindDisk() method instead of
      hard coded find-disk code.
      
      It also renames one parameter to AddNode from ssh to sshkey in order not
      to shadow the ganeti.ssh module.
      
      Reviewed-by: imsnah
      9716fdce
    • Iustin Pop's avatar
      Implement instance rename operation · decd5f45
      Iustin Pop authored
      This patch adds support for instance rename operation at all remaining
      layers: RPC, OpCode/LU and CLI.
      
      Reviewed-by: imsnah
      decd5f45
    • Iustin Pop's avatar
      Add support for rename operation in the OS API · 386b57af
      Iustin Pop authored
      This patch adds support for renaming at OS level. Because of this, we
      need to bump up the version of the OS api from 4 to 5.
      
      The patch also documents the new script interface in the
      ganeti-os-interface(7) man page and adds a section on upgrading the OS
      definitions to the new version.
      
      Reviewed-by: imsnah
      386b57af
  25. 13 Sep, 2007 1 commit
    • Iustin Pop's avatar
      Fix the ssh change which breaks remote ssh commands · 72f0f7fd
      Iustin Pop authored
      Explanation: since we use lists and not a string, every argument we give
      is passed unchanged to the remote shell. So, for example, passing
      '/etc/init.d/ganeti restart' to the remote shell, it will try to run the
      path /etc/init.d/ganeti\ restart. With the s space included. This
      breaks, for example, gnt-node add and gnt-cluster command.
      
      The original problem with the backup routines that led to the "'" change
      is that they use a plain " ".join(list), but we don't need to quote the
      whole ssh remote command for this. We can simply use the existing
      utils.ShellQuoteCmd(list) which does the proper quoting of the ';' or
      '&&' metacharacters.
      
      With this change, both gnt-node add, gnt-cluster command and
      export/import work.
      
      This also improves the error-handling behaviour of one cat command by
      making it conditional on the preceding mkdir.
      
      Reviewed-by: ultrotter
      72f0f7fd
  26. 07 Sep, 2007 1 commit
    • Guido Trotter's avatar
      Make import/export use the auxiliary ssh library to build the remote commands. · 00003458
      Guido Trotter authored
      This avoids forgetting some parameters, as it's happening right now 
      (the correct known host file is not being passed)
      
      In order to do so we split SSHCall into an auxiliary BuildSSHCmd which builds
      the command but doesn't actually call it, and SSHCall itself which runs RunCmd
      on top of BuildSSHCmd's result. BuildSSHCmd is then explicitely called by 
      import/export who has to build a more complex command to be run later.
      00003458
  27. 30 Aug, 2007 1 commit
  28. 24 Aug, 2007 1 commit
    • Iustin Pop's avatar
      Rework ssh known-hosts handling. · 82122173
      Iustin Pop authored
      This changes:
        - cluster setup, we no longer edit /etc/ssh/ssh_known_hosts but our
          own file
        - node add, we no longer remove root's known_hosts (twice)
        - gnt-instance console, both the LU and the script: since now the ssh
          setup is not standard, we need to build the ssh cmdline in the LU
          (instead of manually building it in the script) with the correct
          parameters and use the command line as returned in the script
        - ssh.py, many changes, split options in module-level constants so
          that building the command line in different places is easier/more
          logical
        - backend.py, we no longer remove root's known_hosts in Add node, and
          we allow our own known_hosts file to be uploaded
      
      Reviewed-by: imsnah
      82122173