1. 11 Jul, 2008 2 commits
    • Iustin Pop's avatar
      Fix backend.NodeVolumes handling of LVM output · a17a7623
      Iustin Pop authored
      This is the same fix as for GetVolumeList.
      
      I've checked manually and all other places that call lvm commands are
      already checking the output validity in terms of correct number of
      fields.
      
      Reviewed-by: ultrotter
      a17a7623
    • Iustin Pop's avatar
      Fix backend.GetVolumeList handling of LVM output · df4c2628
      Iustin Pop authored
      Sometimes ‘lvs’ can spit error messages on stdout, even when one wants
      to parse the output:
      ...
      Inconsistent metadata copies found - updating to use version 2776
      ...
      
      So we need to validate the output to guard against such cases.
      
      The patch converts the split on the separater to match against a regex
      and extract the fields via groups. The original separator choice is a
      bad one now :(
      
      Reviewed-by: imsnah
      df4c2628
  2. 27 Jun, 2008 2 commits
  3. 20 Jun, 2008 1 commit
    • Iustin Pop's avatar
      Add a rpc call for BlockDev.Close() · d61cbe76
      Iustin Pop authored
      This patch adds rpc layer calls (in rpc.py and the equivalent in
      ganeti-noded) to close a list of block devices, and the wrapper in
      backend.py that takes a list of Disk objects, identifies them and
      returns correctly formatted results.
      
      The reason why this very basic call was missing until now from the rpc
      layer is that we usually don't care about device closes (though we
      should, and will do so in the future) as only drbd has a meaningful
      Close() operation; right now we directly do Shutdown().
      
      The patch is clean enough that it's actually independent of the live
      migration implementation.
      
      Reviewed-by: imsnah
      d61cbe76
  4. 16 Jun, 2008 2 commits
    • Iustin Pop's avatar
      Expose block device grow in backend.py · 594609c0
      Iustin Pop authored
      This patch adds a wrapper over the block device grow operation that
      converts the input and output parameters as needed for the rpc layer.
      
      Reviewed-by: imsnah
      594609c0
    • Iustin Pop's avatar
      Add migration support at the rpc layer · 2a10865c
      Iustin Pop authored
      This patch adds the migration rpc call and its implementation in the
      backend. The patch does not deal with the correct activation of disks.
      
      Because of the new RPC, the protocol version is increased.
      
      Reviewed-by: imsnah
      2a10865c
  5. 13 May, 2008 2 commits
    • Iustin Pop's avatar
      Implement node daemon conectivity tests · 9d4bfc96
      Iustin Pop authored
      This patch adds in gnt-cluster verify checks for inter-node tcp
      communication checks on the node daemon port for both the primary and
      (if defined) secondary networks.
      
      The output looks like (4-node cluster, one with the secondary interface
      down):
      * Verifying node node1.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      * Verifying node node2.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      * Verifying node node3.example.com
        - ERROR: tcp communication with node 'node1.example.com': failure using the secondary interface(s)
        - ERROR: tcp communication with node 'node2.example.com': failure using the secondary interface(s)
        - ERROR: tcp communication with node 'node4.example.com': failure using the secondary interface(s)
      * Verifying node node4.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      
      Reviewed-by: imsnah
      9d4bfc96
    • Iustin Pop's avatar
      Reduce chance of ssh failures in verify cluster · b544cfe0
      Iustin Pop authored
      The cluster verify builds a sorted list of nodes and passes that to all
      the nodes (in parallel) for ssh checks. This means that for a cluster
      with N nodes, there will be approximately N simultaneous connections to
      the first node, then to the second node, etc. This, coupled with the
      ssh daemon's “MaxStartups” parameter, can create false alarms about ssh
      connectivity.
      
      This patch randomizes the node list in the backend (therefore, each node
      should have it's own order of ssh-ing to the other nodes) and the chance
      of these alarms should be reduced.
      
      Reviewed-by: ultrotter
      b544cfe0
  6. 30 Apr, 2008 1 commit
  7. 28 Apr, 2008 1 commit
    • Iustin Pop's avatar
      Move iallocator script execution to ganeti-noded · 8d528b7c
      Iustin Pop authored
      Currently the iallocator execution takes place in the master, which is a
      violation of the current architecture, and will create problems with a
      threaded master daemon.
      
      This patch moves the execution to the backend, similar to the hooks
      runner, by:
        - introducing a new class that handles the execution in the backend
          (and could be used also for listing the allocators, etc.)
        - introducing a new rpc call
        - replacing the actual execution in IAllocator.Run() with a rpc call
      
      This passes burnin with the dumb allocator
      
      Reviewed-by: imsnah
      8d528b7c
  8. 24 Apr, 2008 1 commit
  9. 10 Apr, 2008 2 commits
    • Iustin Pop's avatar
      Move the OS search code into an abstract function · 57c177af
      Iustin Pop authored
      Based on the previous OS search code changes, we can now move the OS
      search code into a generic look-for-file function in utils.py. This
      means that the allocator code can use the same function.
      
      Reviewed-by: ultrotter
      57c177af
    • Iustin Pop's avatar
      Change backend._OSSearch return values · c34c0cfd
      Iustin Pop authored
      Currently, the function backend._OSSearch() returns the (first) base dir
      in which this OS can be found. Thereafter the full actual path to the OS
      dir is built in the backend.OSFromDisk() function.
      
      This patch changes this so that _OSSearch() always returns the full path
      to the OS directory, and OSFromDisk uses that as returned (it will only
      build it if it gets a base dir in the first place).
      
      This patch is needed before we can abstract the _OSSearch into a generic
      'look for file object' functionality that can be used for allocator
      plugins search too.
      
      Reviewed-by: ultrotter
      c34c0cfd
  10. 05 Apr, 2008 1 commit
    • Manuel Franceschini's avatar
      Backend directory functions for file backend · 778b75bb
      Manuel Franceschini authored
      Add _[Create,Remove,Rename]FileStorageDir function which are needed for
      file-based instance management. These function check whether the given
      directory to operate on is under the cluster-wide defined default file
      storage dir. If this is not the case the won't do anything and return
      False. This is to prevent cluster manipulation or damage.
      
      Reviewed-by: ultrotter
      778b75bb
  11. 18 Mar, 2008 1 commit
  12. 05 Mar, 2008 1 commit
  13. 29 Feb, 2008 1 commit
    • Iustin Pop's avatar
      Fix master role stop on cluster destroy · c9064964
      Iustin Pop authored
      Currently the cluster destroy doesn't remove the master role, which
      means that the IP address of the cluster remains assigned to the master
      node.
      
      This patch fixes this and also a docstring in backend.StopMaster().
      
      Reviewed-by: imsnah
      c9064964
  14. 22 Feb, 2008 2 commits
  15. 14 Feb, 2008 1 commit
    • Iustin Pop's avatar
      Alter the device activation code · 40a03283
      Iustin Pop authored
      This tiny patch fixes the breakage that the previous patch about
      activation did by removing the Close() call after activation.
      
      The initial reason for that call was that if the device is already
      active and open, but we need it closed, we close it automatically.
      
      This however conflicts with the 2-step open in the case the instance is
      already open.
      
      It makes sense to remove the call since in the current Ganeti setup,
      just doing Close() is not enough to change the device from (e.g.)
      primary to secondary, as some devices (e.g. md) might need Shutdown not
      Close.
      
      It also gets rid of a Close() in the CreateBlockDevice function, due to
      the same reasoning (although in Create the child should not have a
      different status anyway).
      
      Reviewed-by: imsnah
      40a03283
  16. 30 Jan, 2008 1 commit
    • Guido Trotter's avatar
      Export bridge information too · 1cafd236
      Guido Trotter authored
      gnt-backup export used to export the ip and mac of each nic, but not which
      bridge it was connected to. Adding this information.
      
      Reviewed-by: iustinp
      
      1cafd236
  17. 21 Jan, 2008 1 commit
    • Iustin Pop's avatar
      Fix VG listing broken by r510 · d87ae7d2
      Iustin Pop authored
      LVM code sometimes adds an extra separator at the end of the field list.
      Make the code strip it if exists.
      
      Reviewed-by: imsnah
      d87ae7d2
  18. 20 Jan, 2008 2 commits
    • Iustin Pop's avatar
      Make backend._GetVGInfo check the validity of 'vgs' · f4d377e7
      Iustin Pop authored
      Currently, the function backend._GetVGInfo only checks for errors via
      the exit code of the 'vgs' command. However, there are other ways of
      failure so we need to also check for valid output before parsing.
      
      Furthermore, the checks on the exit code were reported via a 'raise
      LVMError', however this exception is not handled anywhere and so the
      remote caller will not get reasonable data.
      
      This patch does two main things:
        - change the calling protocol for this function to not raise an error,
          and instead return the same type of argument always (dict) with the
          requested keys but values changed into None; this allows in the
          parent rpc call node_info to have valid memory information but
          "error" value for disk space, if there's an error with disks
        - check the validity of the output so that in case we fail to parse
          it, we don't abort with a backtrace in the node daemon but instead
          return the default result value (containing errors), and log these
          cases in the node daemon log file
      
      We also bump the protocol version to 11.
      
      Reviewed-by: ultrotter
      f4d377e7
    • Iustin Pop's avatar
      Change a hardcoded path into its proper constant · 97628462
      Iustin Pop authored
      The function backend.UploadFile still uses "/etc/hosts" directly instead
      of the existing constant; this patch fixes this.
      
      Reviewed-by: ultrotter
      97628462
  19. 16 Jan, 2008 1 commit
  20. 07 Jan, 2008 1 commit
    • Iustin Pop's avatar
      Improve verify-disks: broken/missing LV detection · b63ed789
      Iustin Pop authored
      This patch improves the ‘gnt-cluster verify-disks’ command by adding
      support for detecting broken volume groups and missing logical volume
      names.
      
      As such, we don't try anymore to activate disks for instances that are
      not likely to succeed anyway, and instead report them.
      
      Reviewed-by: schreiberal
      b63ed789
  21. 11 Dec, 2007 1 commit
    • Iustin Pop's avatar
      Return more data in rpc.call_volume_list · cb2037a2
      Iustin Pop authored
      Currently, the volume_list call returns only the volume size. However,
      it is useful to also have two other things: the 'inactive' state of the
      volume (which might trigger a ‘vgchange -a y’ on the volume group) and
      the online state (which shows if the volume is in use or not).
      
      Since this modifies an RPC call, we also bump the protocol version,
      although the single user of the call didn't care about the dictionary
      values, only about the keys.
      
      Reviewed-by: imsnah
      cb2037a2
  22. 04 Dec, 2007 3 commits
  23. 03 Dec, 2007 1 commit
  24. 14 Nov, 2007 1 commit
    • Guido Trotter's avatar
      When an assembly error occurs log it too · 20a0c9ef
      Guido Trotter authored
      Right now an assembly error produces an exception but not a log message. This
      is bad because the exception suggests looking at the log, but the log itself
      has a lot of errors which are not really a problem and only some which really
      is. In order to make it clear where in the log the problem occurred we log a
      message too, before raising the exception.
      
      Reviewed-by: iustinp
      20a0c9ef
  25. 12 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Fix a wrong comparison in _RecursiveAssembleBD · 7803d4d3
      Iustin Pop authored
      We want to prevent sending too many 'None' children to a device.
      However, the test as it is today is wrong, as we want to test the
      situation after adding a new child, and not before. This patch fixes
      this by testing greater-or-equal instead of just greater.
      
      Reviewed-by: imsnah
      7803d4d3
  26. 09 Nov, 2007 1 commit
  27. 07 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Enhance secondary node replace for drbd8 · 0834c866
      Iustin Pop authored
      This (big) patch does two things:
        - add "local disk status" to the block device checks
          (BlockDevice.GetSyncStatus and the rpc calls that call this
          function, and therefore cmdlib._CheckDiskConsistency)
        - improve the drbd8 secondary replace operation using the above
          functionality
      
      The "local disk status" adds a new variable to the result of
      GetSyncStatus that shows the degradation of the local storage of the
      device. Of course, not all device support this - for now, we only modify
      LogicalVolumes and DRBD8 to return degraded in some cases, other devices
      always return non-degraded. This variable should be a subset of
      is_degraded - whenever this variable is true, the is_degraded should
      also be true.
      
      The drbd8 secondary replace uses this variable as we don't care if the
      primary drbd device is network-degraded, only if it has good local disk
      data (ldisk is False).
      
      The patch also increases the protocol version (due to rpc changes).
      
      Reviewed-by: imsnah
      0834c866
  28. 06 Nov, 2007 2 commits
    • Iustin Pop's avatar
      Allow DRBD8 operation without backing storage · fc1dc9d7
      Iustin Pop authored
      This patch adds the following functionality:
        - DRBD8 devices can assemble without local storage (done by allowing
          None in the list of children, and making DRBD8 to ignore all
          children if any is None)
        - DRBD8 devices can attach (i.e. identify a device) which is not
          connected to backing storage but to the correct network ports; this
          is a rare case in normal operation (it's what would happen if one
          manually detaches the local disk, and the backing LV still exists)
      
      Reviewed-by: imsnah
      fc1dc9d7
    • Iustin Pop's avatar
      Change the way remove children is called in bdev · e739bd57
      Iustin Pop authored
      For some cases, we don't have to have access to the children of a device
      in order to remove them (e.g. md over lvs, or drbd over lvs). In order
      to ease the removal process, skip over finding the child if it provides
      a static dev path.
      
      This is needed in order to support removal of children when the
      underlying storage has gone away.
      
      Reviewed-by: imsnah
      e739bd57
  29. 05 Nov, 2007 2 commits
    • Iustin Pop's avatar
      Fix a unhandled error case in device creation · cf5a8306
      Iustin Pop authored
      The block device creation process is the following:
        - device create
        - device assembly (on primary or depending on dev_type, on secondary
          too)
        - set sync speed
        - return
      
      The problem is that device assembly after creation was not checked for
      errors, and as this is a very unusual case, we did not have problems
      with it (or we didn't detect them). The recent DevCacheManager however
      tripped on this case (because the dev_path of the device is None if the
      assembly fails) and the creation aborted with an unclear error message.
      
      The patch adds a check for the assembly success and aborts the creation
      of the device in this case - the error is quite clear in the instance
      add, for example. The patch also changes DevCacheManager to log the
      cases when dev_path is None but not raise an error (keeping consistent
      with the goal that the cache manager should be transparent to the code).
      
      For the record, this error case was detected with a mismatch between
      drbd kernel module and utilities.
      
      Reviewed-by: imsnah
      cf5a8306
    • Iustin Pop's avatar
      Miscellaneous style fixes · 65fe4693
      Iustin Pop authored
      This patch fixes some minor pylint warnings (unused variables, wrong
      indentation, etc.) and a real bug in the recovery for drbd8 rename
      procedure.
      
      Reviewed-by: imsnah
      65fe4693