1. 04 Nov, 2009 1 commit
    • Iustin Pop's avatar
      Introduce a wrapper for hostname resolving · 104f4ca1
      Iustin Pop authored
      
      
      Currently a few of the LU's CheckPrereq use utils.HostInfo which raises
      a resolver error in case of failure. This is an exception from the
      standard that CheckPrereq should raise an OpPrereqError if the error is
      in the 'pre' phase (so that it can be retried).
      
      This patch adds a new error code (resolver_error) and a wrapper over
      utils.HostInfo that just converts the ResolverError into
      OpPrereqError(…, errors.ECODE_RESOLVER). It then uses this wrapper in
      cmdlib, bootstrap and some scripts.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      104f4ca1
  2. 03 Nov, 2009 3 commits
  3. 02 Nov, 2009 6 commits
    • Iustin Pop's avatar
      Some improvements to gnt-node repair-storage · 7e9c6a78
      Iustin Pop authored
      
      
      Currently the repair storage has two issues:
      
      - down instances are aborting the operation, even though they should be
        ignored (it's not technically possible to know their disk status
        unless we would activate their disks)
      - if the VG is so broken that disks cannot be activated via gnt-instance
        activate-disks or gnt-instance startup, it's not possible to repair
        the VG at all
      
      The patch makes the opcode skip down instances and also introduces an
      ``--ignore-consistency`` flag for forcing the execution of the LU.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7e9c6a78
    • Iustin Pop's avatar
      Add ecode to rpc.py's RpcResult.Raise() · 045dd6d9
      Iustin Pop authored
      
      
      This patch adds a new ecode argument to RpcResult.Raise(). This allows
      specifying the error code (for both OpExec and OpPrereq errors).
      
      Note that this patch also makes the OpExecError exceptions raised from
      _FindFaultInstanceDisks have the error code classification.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      045dd6d9
    • Iustin Pop's avatar
      Introduce two-argument style for OpPrereqError · 5c983ee5
      Iustin Pop authored
      
      
      This patch introduces a two-argument style for OpPrereqError. Only the
      direct raise calls in cmdlib.py are converted, other users will follow.
      
      cli.py is modified to handle both two-argument style and the current
      format. RAPI doesn't need modification as the way we encode errors is
      already using a list for the error arguments, so RAPI users only need to
      start checking the list length and the second argument.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      5c983ee5
    • Iustin Pop's avatar
      Remove the OpRetryError exception · 159d4ec6
      Iustin Pop authored
      
      
      This is only used in two places, in an error path that is no longer
      valid since Ganeti 2.0. We remove the try..except since we should not
      get it anymore (and if we do, then we should catch it in all
      config.Update cases) and we remove the exception class completely.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      159d4ec6
    • Michael Hanselmann's avatar
      Activate disks while exporting an instance · 3e53a60b
      Michael Hanselmann authored
      
      
      Exporting an instance not running or without activated disks
      will fail. This patch makes sure to activate disks before
      exporting an instance if it's in the ADMIN_down state.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      3e53a60b
    • Iustin Pop's avatar
      Unify the query fields for the storage framework · 620a85fd
      Iustin Pop authored
      
      
      This patch unifies the query fields in the storage framework for all
      types. Note that the information is still computed on-demand, so if e.g.
      the used disk space is not requested for the ‘file’ type, it won't be
      computed on nodes.
      
      Summary of changes:
      - improve the LVM storage type to support multiple lvm fields in the
        LIST_FIELDS declaration and constant (not-computed via lvm commands)
        fields
      - rename utils.GetFilesystemFreeSpace to utils.GetFilesystemStats
        returning tuple of (total, free)
      - add used and free as valid fields for lvm-vg (use being computed as
        vg_size-vg_free)
      - make allocatable accepted for all types (ones which are always
        allocatable always return True)
      - add a new list field ‘type’ that gives the current selected type; not
        much useful today (except for understanding what the default output
        is) but in the future might help if we want to list multiple types
      - add type, size and allocatable to the default output field list
      - update the man page with details on how, for file storage, size ≠ used
        + free for non-mountpoint cases
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      620a85fd
  4. 29 Oct, 2009 1 commit
  5. 28 Oct, 2009 2 commits
  6. 27 Oct, 2009 2 commits
    • Michael Hanselmann's avatar
      Provide feedback from redistributing configuration · a4eae71f
      Michael Hanselmann authored
      
      
      This is particularily useful for “gnt-cluster redist-conf”, but
      also for all other cases where the configuration files are
      rewritten on other nodes.
      
      $ gnt-cluster redist-conf
      … Copy of file /var/lib/ganeti/config.data to node … failed: Error while
      executing backend function: [Errno 1] Operation not permitted
      … Error while uploading ssconf files to node …: Error while executing backend
      function: [Errno 1] Operation not permitted
      
      $ gnt-node modify --offline no --force node3.example.com
      … - WARNING: Not enough master candidates (desired 10, new value will be 4)
      … Copy of file /var/lib/ganeti/config.data to node node8.example.com failed:
      Error while executing backend function: [Errno 1] Operation not permitted
      Modified node node3.example.com
       - offline -> True
       - master_candidate -> auto-demotion due to offline
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      a4eae71f
    • Iustin Pop's avatar
      Fix gnt-node evacuate w. iallocator · e9022531
      Iustin Pop authored
      Commit 2bb5c911
      
       moved around and changed the _RunAllocator function in
      the DiskReplace → TaskLet conversion, but in the process it changed the
      relocate_from argument from a list of nodes to just the secondary node.
      This breaks the protocol and current iallocator scripts.
      
      This patch fixes that but also adds a local variable 'instance' since
      it's not nice to write self.instance so many times.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      e9022531
  7. 23 Oct, 2009 1 commit
  8. 22 Oct, 2009 1 commit
  9. 20 Oct, 2009 1 commit
  10. 13 Oct, 2009 1 commit
  11. 12 Oct, 2009 1 commit
  12. 09 Oct, 2009 2 commits
  13. 05 Oct, 2009 3 commits
  14. 02 Oct, 2009 4 commits
  15. 01 Oct, 2009 1 commit
  16. 25 Sep, 2009 1 commit
    • Iustin Pop's avatar
      Fix the confusing ssh/hostname message in node add · 31821208
      Iustin Pop authored
      
      
      Before, it used to say:
      
        ssh/hostname verification failed node1.example.com -> hostname mismatch, got
        node2
      
      Now it says for wrong hostnames (maybe too verbose):
      
        ssh/hostname verification failed (checking from node1.example.com): hostname
        mismatch, expected node2.example.com but got node3
      
      And for non-FQDN hostnames:
      
        ssh/hostname verification failed (checking from node1.example.com): hostname
        not FQDN: expected node2.example.com but got node2
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      31821208
  17. 24 Sep, 2009 3 commits
  18. 21 Sep, 2009 2 commits
  19. 17 Sep, 2009 3 commits
    • Michael Hanselmann's avatar
      3cebe102
    • Iustin Pop's avatar
      Add an error-simulation mode to cluster verify · a0c9776a
      Iustin Pop authored
      
      
      One of the issues we have in ganeti is that it's very hard to test the
      error-handling paths; QA and burnin only test the OK code-path, since
      it's hard to simulate errors.
      
      LUVerifyCluster is special amongst the LUs in the fact that a) it has a
      lot of error paths and b) the error paths only log the error, they don't
      do any rollback or other similar actions. Thus, it's enough for this LU
      to separate the testing of the error condition from the logging of the
      error condition.
      
      This patch does this by replacing code blocks of the form:
      
        if x:
          log_error()
          [y]
      
      into:
      
        log_error_if(x)
        [if x:
          y
        ]
      
      After this change, it's simple enough to turn on logging of all errors
      by adding a special case inside log_error_if such that if the incoming
      opcode has a special ‘debug_simulate_errors’ attribute and it's true, it
      will log unconditionally the error.
      
      Surprisingly this also turns into an absolute code reduction, since some
      of the if blocks were simplified. The only downside to this patch is
      that the various _VerifyX() functions are now stateful (modifying an
      attribute on the LU instance) instead of returning a boolean result.
      
      Last note: yes, this discovered some error cases in the logging.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a0c9776a
    • Iustin Pop's avatar
      Introduce parseable error codes in LUVerifyCluster · 7c874ee1
      Iustin Pop authored
      
      
      Currently the output of cluster verify can be parsed for 'ERROR'
      messages, but that is the only indication we get (error or no error). In
      order to allow monitoring tools to separate different error conditions,
      this patch introduces a new output format (“gnt-cluster verify
      --error-codes”) that changes the output from human-friendly to
      machine-friendly. In this mode, an error line changes from:
        ERROR: node node1: drbd minor 1 of instance inst1.is not active
      
      to:
        ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active
      
      i.e. the error message is a ‘:’-separated field, with ERROR in the first
      place, the error code in the second, the object type (cluster, node,
      instance) in the third, the name of the object (for nodes/instances) in
      the fourth, and then the text message.
      
      The patch also removes some of the verbosity of the operation
      (“Verifying instance X”, “Verifying node X”) since on big clusters these
      informational messages can quickly fill up an entire screen. The
      original behaviour can be restored via the ‘--verbose’ option.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7c874ee1
  20. 16 Sep, 2009 1 commit