1. 07 Nov, 2007 2 commits
    • Iustin Pop's avatar
      Enhance secondary node replace for drbd8 · 0834c866
      Iustin Pop authored
      This (big) patch does two things:
        - add "local disk status" to the block device checks
          (BlockDevice.GetSyncStatus and the rpc calls that call this
          function, and therefore cmdlib._CheckDiskConsistency)
        - improve the drbd8 secondary replace operation using the above
          functionality
      
      The "local disk status" adds a new variable to the result of
      GetSyncStatus that shows the degradation of the local storage of the
      device. Of course, not all device support this - for now, we only modify
      LogicalVolumes and DRBD8 to return degraded in some cases, other devices
      always return non-degraded. This variable should be a subset of
      is_degraded - whenever this variable is true, the is_degraded should
      also be true.
      
      The drbd8 secondary replace uses this variable as we don't care if the
      primary drbd device is network-degraded, only if it has good local disk
      data (ldisk is False).
      
      The patch also increases the protocol version (due to rpc changes).
      
      Reviewed-by: imsnah
      0834c866
    • Michael Hanselmann's avatar
      Check whether init.d script is executable. · 7dd30006
      Michael Hanselmann authored
      Reviewed-by: schreiberal
      7dd30006
  2. 06 Nov, 2007 1 commit
  3. 05 Nov, 2007 3 commits
    • Iustin Pop's avatar
      Handle missing init script at cluster init · c7b46d59
      Iustin Pop authored
      This patch adds a check in the prereq of LUInitCluster for the existence
      of the init script.  This allows a clean abort instead of a stack dump.
      
      Based on a report by admin@steibei.net
      
      Reviewed-by: ultrotter
      c7b46d59
    • Iustin Pop's avatar
      Miscellaneous style fixes · 65fe4693
      Iustin Pop authored
      This patch fixes some minor pylint warnings (unused variables, wrong
      indentation, etc.) and a real bug in the recovery for drbd8 rename
      procedure.
      
      Reviewed-by: imsnah
      65fe4693
    • Guido Trotter's avatar
      Convert os_get to use OS rather than InvalidOS · dfa96ded
      Guido Trotter authored
      In order to do this for simplicity we leave the OSFromDisk function as-is and
      we convert the eventual exception to an OS object in ganeti-noded. The
      unmangling gets simplified and so does the code for checking whether the OS is
      valid.
      
      Reviewed-By: iustinp
      
      dfa96ded
  4. 04 Nov, 2007 1 commit
    • Guido Trotter's avatar
      Make call_os_get a single node function · 00fe9e38
      Guido Trotter authored
      call_os_get is never called with a real list of nodes, so there's no point in
      it being multi-node. Making it single-node till a usage for multi-node call is
      found.
      
      Reviewed-By: iustinp
      00fe9e38
  5. 03 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Implement tag searching · 73415719
      Iustin Pop authored
      This patch adds a search command for locating tags on all objects of the
      cluster using a regex pattern.
      
      Reviewed-by: aat
      73415719
  6. 02 Nov, 2007 1 commit
    • Iustin Pop's avatar
      Implement device to instance mapping cache · 3f78eef2
      Iustin Pop authored
      Currently, troubleshooting DRBD problems involves a manual process of going
      backwards from the DRBD device to the instance that owns it.
      
      This patch adds a weak (i.e. not guaranteed to be correct or up-to-date)
      cache of device to instance. The cache should be, in normal operation,
      having correct information as the only time when devices change paths
      are when they are started/stopped, and the code in backend.py adds cache
      updates to exactly these operations.
      
      The only drawback of this implementation is that we don't fully update
      the cache on renames of devices (we clean the old entries but we don't
      add new ones). Since the rename changes the path only for LVs (and not
      drbd and md), this is less of a problem as the target of this code is
      debugging DRBD and MD issues.
      
      The patch writes files named bdev_drbd<N> (or bdev_md<N>,
      bdev_xenvg_...) in /var/run/ganeti (more exactly, LOCALSTATEDIR/ganeti).
      The files start with 'bdev_' and continue with the path of the device
      under /dev/ (this prefix stripped), and contain the following values,
      space separated:
        - instance name
        - primary or secondary (depending on how the device is on the primary
          or secondary node)
        - instance visible name: sda or sdb or not_visible, the latter case
          when the device is not the top-level device (i.e. remote_raid1
          templates will have sd[ab] for the md, but not_visible for drbd and
          logical volumes)
      
      The cache is designed to not raise any errors, if there is an I/O error
      it will only be logged in the node daemon log file. This is in order to
      reduce the possible impact of the cache on the block device activation
      and shutdown code.
      
      Reviewed-by: imsnah
      3f78eef2
  7. 31 Oct, 2007 2 commits
    • Iustin Pop's avatar
      More sane handling of errors during failover · 24a40d57
      Iustin Pop authored
      Currently we ignore errors on instance shutdown (on the source node)
      during instance failover. We should do this only if the user gave a
      command line options allowing this, as it's a dangerous thing to do.
      
      This patch fixes this by using the same "--ignore-consistency" option
      for deciding whether to continue or abort. It also expands a bit the man
      page.
      
      Reviewed-by: imsnah
      24a40d57
    • Iustin Pop's avatar
      Fix bridge checking in instance failover · 50ff9a7a
      Iustin Pop authored
      The current code checks the bridge on the primary node of the instance,
      but we need to check it on the destination node.
      
      This was caught by testing failover with a down primary node.
      
      Reviewed-by: imsnah
      50ff9a7a
  8. 30 Oct, 2007 1 commit
  9. 29 Oct, 2007 2 commits
    • Iustin Pop's avatar
      Change the signature of some methods of mcpu.Processor · 1a8c0ce1
      Iustin Pop authored
      This patch moves the passing of the feedback_fn argument from the
      (Exec|Chain)OpCode to the initialization of the Processor instance.
      
      Reviewed-by: imsnah
      1a8c0ce1
    • Iustin Pop's avatar
      Implement replace-disks for drbd8 devices · a9e0c397
      Iustin Pop authored
      This patch adds three modes of disk replacement for drbd8:
        - replace the disk on the primary node
        - replace the disk on the secondary node
        - replace the secondary node
      
      It also adds some debugging code to backend.py and increments the
      protocol version for the recent changes of the rpc layer.
      
      Reviewed-by: imsnah
      a9e0c397
  10. 25 Oct, 2007 1 commit
    • Iustin Pop's avatar
      Modify two mirror-device related rpc calls · 153d9724
      Iustin Pop authored
      The two calls mirror_addchild and mirror_removechild take only one child
      for addition/removal. While this is enough for our md usage, for local
      disk replacement in drbd8, we need to be able to specify both the data
      and metadata device. This patch modifies these two rpc calls (and their
      backend implementation and their usage in cmdlib) to take a list of
      children to add/remove.
      
      Reviewed-by: imsnah
      153d9724
  11. 24 Oct, 2007 2 commits
    • Iustin Pop's avatar
      Initial implementation of drbd8 template type · a1f445d3
      Iustin Pop authored
      This is a partially working drbd8 template type. It does:
        - add/remove
        - startup/failover/shutdown
      
      Not working is replace disks, which needs custom code for this template.
      
      Reviewed-by: imsnah
      a1f445d3
    • Iustin Pop's avatar
      Fix a disk handling bug triggered by failover · b352ab5b
      Iustin Pop authored
      This leaves an instance's disks configured for the primary node as after
      disk activation we want to start the instance anyway. As such,
      _GatherBlockDevs in backend.py will need the disks configured for the
      primary.
      
      Reviewed-by: imsnah
      b352ab5b
  12. 19 Oct, 2007 1 commit
    • Iustin Pop's avatar
      Abstract more strings values into constants · fe96220b
      Iustin Pop authored
      Currently, the disk types are defined using constants in the code.
      Convert those into constants so that we can easily find them and check
      their usage.
      
      Note that we don't rename the values of the constants as they are used
      in the configuration file, and as such it's best to leave them as they
      are.
      
      Reviewed-by: imsnah
      fe96220b
  13. 18 Oct, 2007 1 commit
    • Alexander Schreiber's avatar
      Patch series for reboot feature, part 2 · bf6929a2
      Alexander Schreiber authored
      This patch series implements the reboot command for gnt-instance. It
      supports three types of reboot: soft (hypervisor reboot), hard (instance
      config rebuild and reboot) and full (full instance shutdown and startup
      again).
      
      This patch contains the opcode and lu part.
      
      Reviewed-by: iustinp
      
      bf6929a2
  14. 16 Oct, 2007 4 commits
  15. 11 Oct, 2007 2 commits
    • Iustin Pop's avatar
      Some small improvements to the hooks environment · 0e137c28
      Iustin Pop authored
      For the configuration update hook, it's useful to have a consistent name
      for the target of the operation. As such, the LU code is modified to
      include an GANETI_OP_TARGET that points either to the cluster (name),
      node name or instance name depending on the opcode.
      
      Also, the NoHooksLU is modified such that its build env method returns
      an empty (but conformant) result. This should improve things in case by
      mistake this class' BuildHooksEnv is called.
      
      Reviewed-by: imsnah
      0e137c28
    • Iustin Pop's avatar
      Split the hooks env building in two parts · 4167825b
      Iustin Pop authored
      This patch moves some of the environment processing from _BuildEnv to a
      new _RunWrapper command which does the stringification and adds the
      sstore variables.
      
      The reasoning is that the sstore can be fresher than before the
      execution (e.g.  in case of cluster init).
      
      In order to support thise, we also need to modify cmdlib.LUInitCluster:
        - memorize the sstore and cfgw newly created in the Exec function
        - no need to build the custom environment in the BuildHooks
      4167825b
  16. 10 Oct, 2007 2 commits
    • Alexander Schreiber's avatar
      Remove fping as a dependency for Ganeti. · 16abfbc2
      Alexander Schreiber authored
      This patch completely  gets rid of fping
       - replace all fping invocations with TcpPing calls
       - update documentation accordingly.
       - associated cleanups (use constant for localhost IP, use more sensible
         defaults for TcpPing and _use_ those)
      
      Reviewed-by: iustinp
      
      16abfbc2
    • Iustin Pop's avatar
      Remove the shebang from modules · 2f31098c
      Iustin Pop authored
      Since modules are not directly executables, remove the shebang from
      them. This helps with lintian warnings.
      
      Also make the autogenerated _autoconf.py contain two comment lines at
      the beginning, like the other modules.
      
      Reviewed-by: ultrotter
      2f31098c
  17. 08 Oct, 2007 3 commits
  18. 21 Sep, 2007 1 commit
    • Iustin Pop's avatar
      Remove requirement that host names are FQDN · 89e1fc26
      Iustin Pop authored
      We currently require that hostnames are FQDN not short names
      (node1.example.com instead of node1). We can allow short names as long
      as:
        - we always resolve the names as returned by socket.gethostname()
        - we rely on having a working resolver
      
      These issues are not as big as may seem, as we only did gethostname() in
      a few places in order to check for the master; we already required
      working resolver all over the code for the other nodes names (and thus
      requiring the same for the current node name is normal).  The patch
      moves some resolver calls from within execution path to the checking
      path (which can abort without any problems). It is important that after
      this patch is applied, no name resolving is called from the execution
      path (LU.Exec() or other code that is called from within those methods)
      as in this case we get much better code flow.
      
      This patch also changes the functions for doing name lookups and
      encapsulates all functionality in a single class.
      
      The final change is that, by requiring working resolver at all times, we
      can change the 'return None' into an exception and thus we don't have to
      check manually each time; only some special cases will check
      (ganeti-daemon and ganeti-watcher which are not covered by the
      generalized exception handling in cli.py). The code is cleaner this way.
      
      Reviewed-by: imsnah
      89e1fc26
  19. 19 Sep, 2007 2 commits
    • Iustin Pop's avatar
      Allow 'add instance' to not start the new instance · bdd55f71
      Iustin Pop authored
      This patch allows 'gnt-instance add' to not start the newly-created
      instance. It also allow 'gnt-instance add' and 'gnt-backup import' to
      not check for IP conflicts (only when not starting the instance).
      
      Reviewed-by: ultrotter
      bdd55f71
    • Iustin Pop's avatar
      Change resolved hostname from dict to a class · bcf043c9
      Iustin Pop authored
      The current result of utils.LookupHostname() is a dict, but this does
      not allow static checkers to check the correctness of the code. This
      patch introduces a new class names HostInfo and changes LookupHostname
      to return an instance of this class; this allows better checking of the
      code (and also the code is cleaner).
      
      Reviewed-by: ultrotter
      bcf043c9
  20. 18 Sep, 2007 1 commit
    • Iustin Pop's avatar
      Implement cluster rename operation · 07bd8a51
      Iustin Pop authored
      This patch adds a new OpCode (and corresponding LU) that implements the
      cluster rename functionality.
      
      This is done by shutting down the master role, making the needed sstore
      modifications and distributing the changed files to all nodes, and then
      re-enabling the master role.
      
      The modification to the man page of gnt-cluster also moves the section
      on gnt-cluster destroy in order to correct alphabetical ordering.
      
      Reviewed-by: imsnah
      07bd8a51
  21. 17 Sep, 2007 1 commit
  22. 14 Sep, 2007 5 commits
    • Iustin Pop's avatar
      Change OpQueryNodes nodes attribute to names · 246e180a
      Iustin Pop authored
      Change this to have the exact same parameters as OpQueryInstances.
      
      Also fix burnin which is broken since r146.
      
      Reviewed-by: imsnah
      246e180a
    • Iustin Pop's avatar
      Enable LUQueryInstances to work with a given list of instances · 069dcc86
      Iustin Pop authored
      As per the changes to LUQueryNodes, the QueryInstances LU is modified to
      accept a list of instances for which to compute and return information.
      
      Reviewed-by: imsnah
      069dcc86
    • Iustin Pop's avatar
      Remove OpQueryNodeData and LUQueryNodeData · 4a72cc75
      Iustin Pop authored
      Now that LUQueryNodes supports all the functionality of LUQueryNodeData,
      let's migrate gnt-node.ShowNodeConfig to use it and remove all traces of
      OpQueryNodeData and LUQueryNodeData.
      
      Reviewed-by: imsnah
      4a72cc75
    • Iustin Pop's avatar
      Change LUQueryNodes to return raw values and support selective listing · ec223efb
      Iustin Pop authored
      LUQueryNodes it's very similar to LUQueryNodeData, but it lacks two
      features:
        - instance list (it has count though), both primary and secondary
        - selective node listing
      
      In order to support these features, we change it to return raw values
      instead of stringified ones (like the recent change to LUQueryInstances)
      and to support query-ing of a restricted set of nodes.
      
      This CL also modifies the gnt-node script to conform to the new protocol
      and the opcode OpQueryNodes to support the new "nodes" attribute.
      
      Reviewed-by: imsnah
      ec223efb
    • Iustin Pop's avatar
      Change _GetWanted* to return names instead of objects · a7ba5e53
      Iustin Pop authored
      On closer look, all except one of the current users of _GetWantedNodes are
      using only the name of the nodes and throw away the other attributes. It makes
      sense to make this function return only the name list (as in the future this
      might be faster than computing all attributes).
      
      Reviewed-by: imsnah
      a7ba5e53