1. 15 Jul, 2010 1 commit
  2. 08 Jul, 2010 1 commit
  3. 06 Jul, 2010 1 commit
  4. 01 Jul, 2010 1 commit
  5. 23 Jun, 2010 5 commits
  6. 18 May, 2010 3 commits
  7. 16 Apr, 2010 2 commits
  8. 12 Apr, 2010 1 commit
    • Iustin Pop's avatar
      Add a identify-defaults options for import · e588764d
      Iustin Pop authored
      
      
      When importing an instance, all the saved valued will be used as
      explicitly specified values, overriding the cluster defaults. This means
      export+import will change the status (from default to explicitly
      specified) of parameters.
      
      This patch adds a new option that changes the behaviour to identify
      parameter values which are equal to the current cluster defaults and
      mark them as such. It does this for hv, be and nic parameters.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      e588764d
  9. 08 Apr, 2010 1 commit
  10. 17 Mar, 2010 2 commits
  11. 15 Mar, 2010 1 commit
    • Iustin Pop's avatar
      Implement conversion from plain to drbd · e29e9550
      Iustin Pop authored
      
      
      This patch adds a new mode to instance modify, the changing of the disk
      template. For now only plain to drbd conversion is supported, and the
      new secondary node must be specified manually (no iallocator support).
      
      The procedure for conversion works as follows:
      
      - a completely new disk template is created, matching the count, size
        and mode of the instance's current disks
      - we create manually (not via _CreateDisks) all the missing volumes
      - we rename on the primary the LVs to the new name
      - we create manually the DRBD devices
      
      Failures during the creation of volumes will leave orphan volumes.
      Failure during the rename might leave some disks renamed and some not,
      leading to an inconsistent instance.
      
      Once the disks are renamed, we update the instance information and wait
      for resync. Any failures of the DRBD sync must be manually handled (like
      a normal failure, e.g. by running replace-disks, etc.).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      e29e9550
  12. 10 Mar, 2010 1 commit
  13. 09 Mar, 2010 2 commits
    • Iustin Pop's avatar
      Rework the node modify for mc-demotion · 601908d0
      Iustin Pop authored
      
      
      The current code in LUSetNodeParms regarding the demotion from master
      candidate role is complicated and duplicates the code in ConfigWriter,
      where such decisions should be made. Furthermore, we still cannot demote
      nodes (not even with force), if other regular nodes exist.
      
      This patch adds a new opcode attribute ‘auto_promote’, and changes the
      decision tree as follows:
      
      - if the node will be set to offline or drained or explicitly demoted
        from master candidate, and this parameter is set, then we lock all
        nodes in ExpandNames()
      - later, in CheckPrereq(), if the node is
        indeed a master candidate, and the future state (as computed via
        GetMasterCandidateStats with the current node in the exception list)
        has fewer nodes than it should, and we didn't lock all nodes, we exit
        with an exception
      - in Exec, if we locked all nodes, we do a AdjustCandidatePool() run, to
        ensure nodes are locked as needed (we do it before updating the node
        to remove a warning, and prevent the situation that if the LU fails
        between these, we're not left with an inconsistent state)
      
      Note that in Exec we run the AdjustCP irrespective of any node state
      change (just based on lock status), so we might simplify the CheckPrereq
      even more by not checking the future state, basically requiring
      auto_promote/lock_all for master candidates, since the case where we
      have more than needed master candidates is rarer; OTOH, this would prevent
      manual promotion ahead of time of another node, which is why I didn't
      choose this way.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      601908d0
    • René Nussbaumer's avatar
      Add support for per-os-hypervisor parameters · 17463d22
      René Nussbaumer authored
      
      
      This patch implements all modifications to support per-os-hypervisor
      parameters in the framework.
      Signed-off-by: default avatarRené Nussbaumer <rn@google.com>
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      17463d22
  14. 22 Feb, 2010 2 commits
  15. 12 Feb, 2010 1 commit
  16. 11 Feb, 2010 1 commit
  17. 10 Feb, 2010 1 commit
    • Iustin Pop's avatar
      Fix dumpers/loaders after __slots__ cleanup · adf385c7
      Iustin Pop authored
      Commit 154b9580
      
       changed (correctly) the __slots__ usage, but this broke
      dumpers/loaders since we relied directly on the own class __slots__
      field.
      
      To compensate, we introduce a simple function for computing the slots
      across all parent classes (if any), and use this instead of __slots__
      directly.
      
      Note: the _all_slots() function is duplicated between objects.py and
      opcodes.py, but the only other options is to introduce a lang.py for
      such very basic language items.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      adf385c7
  18. 09 Feb, 2010 1 commit
    • Iustin Pop's avatar
      Add an early release lock/storage for disk replace · 7ea7bcf6
      Iustin Pop authored
      
      
      This patch adds an early_release parameter in the OpReplaceDisks and
      OpEvacuateNode opcodes, allowing earlier release of storage and more
      importantly of internal Ganeti locks.
      
      The behaviour of the early release is that any locks and storage on all
      secondary nodes are released early. This is valid for change secondary
      (where we remove the storage on the old secondary, and release the locks
      on the old and new secondary) and replace on secondary (where we remove
      the old storage and release the lock on the secondary node.
      
      Using this, on a three node setup:
      
      - instance1 on nodes A:B
      - instance2 on nodes C:B
      
      It is possible to run in parallel a replace-disks -s (on secondary) for
      instances 1 and 2.
      
      Replace on primary will remove the storage, but not the locks, as we use
      the primary node later in the LU to check consistency.
      
      It is debatable whether to also remove the locks on the primary node,
      and thus making replace-disks keep zero locks during the sync. While
      this would allow greatly enhanced parallelism, let's first see how
      removal of secondary locks works.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      7ea7bcf6
  19. 27 Jan, 2010 1 commit
  20. 16 Dec, 2009 1 commit
  21. 03 Nov, 2009 1 commit
  22. 02 Nov, 2009 1 commit
    • Iustin Pop's avatar
      Some improvements to gnt-node repair-storage · 7e9c6a78
      Iustin Pop authored
      
      
      Currently the repair storage has two issues:
      
      - down instances are aborting the operation, even though they should be
        ignored (it's not technically possible to know their disk status
        unless we would activate their disks)
      - if the VG is so broken that disks cannot be activated via gnt-instance
        activate-disks or gnt-instance startup, it's not possible to repair
        the VG at all
      
      The patch makes the opcode skip down instances and also introduces an
      ``--ignore-consistency`` flag for forcing the execution of the LU.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7e9c6a78
  23. 13 Oct, 2009 2 commits
  24. 09 Oct, 2009 1 commit
  25. 05 Oct, 2009 1 commit
  26. 17 Sep, 2009 2 commits
    • Iustin Pop's avatar
      Add an error-simulation mode to cluster verify · a0c9776a
      Iustin Pop authored
      
      
      One of the issues we have in ganeti is that it's very hard to test the
      error-handling paths; QA and burnin only test the OK code-path, since
      it's hard to simulate errors.
      
      LUVerifyCluster is special amongst the LUs in the fact that a) it has a
      lot of error paths and b) the error paths only log the error, they don't
      do any rollback or other similar actions. Thus, it's enough for this LU
      to separate the testing of the error condition from the logging of the
      error condition.
      
      This patch does this by replacing code blocks of the form:
      
        if x:
          log_error()
          [y]
      
      into:
      
        log_error_if(x)
        [if x:
          y
        ]
      
      After this change, it's simple enough to turn on logging of all errors
      by adding a special case inside log_error_if such that if the incoming
      opcode has a special ‘debug_simulate_errors’ attribute and it's true, it
      will log unconditionally the error.
      
      Surprisingly this also turns into an absolute code reduction, since some
      of the if blocks were simplified. The only downside to this patch is
      that the various _VerifyX() functions are now stateful (modifying an
      attribute on the LU instance) instead of returning a boolean result.
      
      Last note: yes, this discovered some error cases in the logging.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a0c9776a
    • Iustin Pop's avatar
      Introduce parseable error codes in LUVerifyCluster · 7c874ee1
      Iustin Pop authored
      
      
      Currently the output of cluster verify can be parsed for 'ERROR'
      messages, but that is the only indication we get (error or no error). In
      order to allow monitoring tools to separate different error conditions,
      this patch introduces a new output format (“gnt-cluster verify
      --error-codes”) that changes the output from human-friendly to
      machine-friendly. In this mode, an error line changes from:
        ERROR: node node1: drbd minor 1 of instance inst1.is not active
      
      to:
        ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active
      
      i.e. the error message is a ‘:’-separated field, with ERROR in the first
      place, the error code in the second, the object type (cluster, node,
      instance) in the third, the name of the object (for nodes/instances) in
      the fourth, and then the text message.
      
      The patch also removes some of the verbosity of the operation
      (“Verifying instance X”, “Verifying node X”) since on big clusters these
      informational messages can quickly fill up an entire screen. The
      original behaviour can be restored via the ‘--verbose’ option.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7c874ee1
  27. 24 Aug, 2009 1 commit
  28. 17 Aug, 2009 1 commit