Skip to content
Snippets Groups Projects
  1. Jun 06, 2008
  2. May 31, 2008
  3. May 30, 2008
    • Iustin Pop's avatar
      Complete removal of md/drbd 0.7 code · abdf0113
      Iustin Pop authored
      This patch removes the last of the md and drbd 0.7 code. Cluster which
      have the old device types will be broken if they have this applied.
      
      Reviewed-by: imsnah
      abdf0113
    • Iustin Pop's avatar
      LURemoveInstance: fix op.ignore_failures usage · 5c54b832
      Iustin Pop authored
      Currently: the LURemoveInstance.Exec() method uses the ignore_failures
      attribute of the OpRemoveInstance opcode, but it doesn't check for its
      existence. The patch adds this attribute to _OP_REQP and to all the
      places where this opcode was created.
      
      This attributes is always passed by gnt-instance, but burnin didn't pass
      it so it can fail if it enters the 'fail to remove disks' branch of the
      method (which is why it was not triggered until now).
      
      Reviewed-by: ultrotter, imsnah
      5c54b832
  4. May 29, 2008
  5. May 24, 2008
  6. May 15, 2008
  7. May 13, 2008
    • Iustin Pop's avatar
      Small style fixes · 8d59409f
      Iustin Pop authored
      [Trunk version]
      
      Reviwed-by: imsnah
      8d59409f
    • Iustin Pop's avatar
      Implement node daemon conectivity tests · 9d4bfc96
      Iustin Pop authored
      This patch adds in gnt-cluster verify checks for inter-node tcp
      communication checks on the node daemon port for both the primary and
      (if defined) secondary networks.
      
      The output looks like (4-node cluster, one with the secondary interface
      down):
      * Verifying node node1.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      * Verifying node node2.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      * Verifying node node3.example.com
        - ERROR: tcp communication with node 'node1.example.com': failure using the secondary interface(s)
        - ERROR: tcp communication with node 'node2.example.com': failure using the secondary interface(s)
        - ERROR: tcp communication with node 'node4.example.com': failure using the secondary interface(s)
      * Verifying node node4.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      
      Reviewed-by: imsnah
      9d4bfc96
    • Michael Hanselmann's avatar
      Forward-port changes made to readd in 1.2 · 102b115b
      Michael Hanselmann authored
      qa_node.py: Fix typo in message
      cmdlib.py: Don't add readded node to node list
      ganeti-qa.py: Make sure readd isn't done for master node
      
      Reviewed-by: iustinp
      102b115b
    • Iustin Pop's avatar
      CLI: retry: remove command opts/args in "gnt-X" · 4e713df6
      Iustin Pop authored
      This new version of the patch removes only the listing of the usage in
      the "gnt-X" list, but keeps the strings in since we'll want to enhance
      and use them in "gnt-X $cmd --help".
      
      Reviewed-by: ultrotter
      4e713df6
    • Iustin Pop's avatar
      Revert "CLI: remove command opts/args in "gnt-X"" · 9a033156
      Iustin Pop authored
      This reverts commit 976.
      
      Reviewed-by: ultrotter
      9a033156
    • Iustin Pop's avatar
      CLI: remove command opts/args in "gnt-X" · 57d0151e
      Iustin Pop authored
      [Forward-port of the 1.2 branch patch]
      
      This patch removes all the parameters and options from the output
      "gnt-X" (i.e. the subcommand list for command). This is done in order to
      uniformize the output, currently only some parameters are shown and they
      are not always consistent (e.g. required versus important parameters).
      
      Reviewed-by: ultrotter
      57d0151e
    • Iustin Pop's avatar
      Watcher: do not activate disks for started instances · eee1fa2d
      Iustin Pop authored
      Currently the watcher runs first the instance startup and then the
      boot-id method of disk reactivation. However, irrelevant of the fact
      that a node has rebooted or not, if we just started an instance, there's
      no need for its disks to be activated again, since the start instance
      has done that (if it is at all possible).
      
      The patch modifies the watcher to remember all started instances and not
      run activate-disks for them.
      
      Reviewed-by: ultrotter
      eee1fa2d
    • Iustin Pop's avatar
      Watcher: do not activate disks for admin_down · 0c0f834d
      Iustin Pop authored
      Currently the watcher does activate disks (via bootid mechanisms) even
      for admin_down instances.  This patch logs and skips over these
      instances.
      
      Reviewed-by: ultrotter
      0c0f834d
    • Iustin Pop's avatar
      Reduce chance of ssh failures in verify cluster · b544cfe0
      Iustin Pop authored
      The cluster verify builds a sorted list of nodes and passes that to all
      the nodes (in parallel) for ssh checks. This means that for a cluster
      with N nodes, there will be approximately N simultaneous connections to
      the first node, then to the second node, etc. This, coupled with the
      ssh daemon's “MaxStartups” parameter, can create false alarms about ssh
      connectivity.
      
      This patch randomizes the node list in the backend (therefore, each node
      should have it's own order of ssh-ing to the other nodes) and the chance
      of these alarms should be reduced.
      
      Reviewed-by: ultrotter
      b544cfe0
  8. May 12, 2008
    • Iustin Pop's avatar
      bdev: always log command output if it failed · 6c896e2f
      Iustin Pop authored
      Currently many error handling code paths in bdev.py log only
      result.fail_reason (i.e. exit code or signal that killed the command)
      but not its output. This makes debugging very hard.
      
      The patch changes all places where we only log fail_reason to also log
      result.output.
      
      Reviewed-by: ultrotter
      6c896e2f
  9. May 10, 2008
    • Iustin Pop's avatar
      DRBD: Fix another bug in diskless activation · ab6cc81c
      Iustin Pop authored
      DRBD8 requires that we pass ‘--create-device’ to the first command that
      wants to activate a new DRBD minor. We do this currently when we run the
      “drbdsetup ... disk” command which we run before the network setup.
      
      But if the LVs are missing, we skip the ‘disk’ subcommand and run only
      the ‘net’ one, so it might be that the activation fails because the
      minor we selected was never created in the first place.
      
      The patch adds the required parameter to the DRBD8._AssembleNet() call.
      Since it's a no-op for existing minors, it should not create any
      problems (tested and works both with configured and unconfigured
      minors).
      
      Reviewed-by: ultrotter
      ab6cc81c
  10. May 09, 2008
    • Michael Hanselmann's avatar
      Remove utils.CheckDaemonAlive and use “xm info” instead · e3e66f02
      Michael Hanselmann authored
      There are a couple of reasons for doing so:
      - /proc is not OS independent, it's only supported by Linux (there are
        emulations on other systems, but those might differ from the way
        Linux represents data).
      - Checking a daemon's state doesn't necessarily mean it's usable.
        Connecting to the socket using “xm info” is much safer.
      - Reduce code size.
      
      Reviewed-by: iustinp
      e3e66f02
  11. May 08, 2008
    • Guido Trotter's avatar
      Improve DRBD8.Open's docstring a bit more · f860ff4e
      Guido Trotter authored
      Reviewed-by: iustinp
      
      f860ff4e
    • Guido Trotter's avatar
      Fix comment typo in bdev.py · 7b62772e
      Guido Trotter authored
      Reviewed-by: iustinp
      
      7b62772e
    • Iustin Pop's avatar
      Fix DRBD8 diskless assembling · bf25af3b
      Iustin Pop authored
      The algorithm for attaching to existing DRBD devices is not trivial. It
      has four alternatives, and there is a bug in the last one when we have
      diskless devices.
      
      The last case (local disk info matches but remote/network configuration
      doesn't match) we disconnect from the network and reattach with the
      correct info. We do this because correct local device has higher
      priority than remote device.
      
      However, the test we use (self._MatchesLocal) can succeed in two cases:
        - we have a disk and it's the same as the one attached
        - we don't have a disk and the drbd is in diskless mode
      
      But this creates problems for the fourth case as when we already have
      one diskless DRBD, activating then next one will do:
        - _MatchesLocal? yes, because both config data and system have no
          disks (with the effect that all diskless devices are identical)
        - _MatchesRemote? no, because this disk is configured to its current
          remote peer, not to our new one
      
      The fix is trivial, although the algorithm not: we only allow overriding
      the network configuration when the disk information matches and we are
      not diskless, by adding the <"local_dev" in info'> test.
      
      Reviewed-by: ultrotter
      bf25af3b
Loading