Skip to content
Snippets Groups Projects
  1. Nov 02, 2009
    • Iustin Pop's avatar
      Some improvements to gnt-node repair-storage · 7e9c6a78
      Iustin Pop authored
      
      Currently the repair storage has two issues:
      
      - down instances are aborting the operation, even though they should be
        ignored (it's not technically possible to know their disk status
        unless we would activate their disks)
      - if the VG is so broken that disks cannot be activated via gnt-instance
        activate-disks or gnt-instance startup, it's not possible to repair
        the VG at all
      
      The patch makes the opcode skip down instances and also introduces an
      ``--ignore-consistency`` flag for forcing the execution of the LU.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7e9c6a78
  2. Oct 13, 2009
  3. Oct 09, 2009
  4. Oct 05, 2009
  5. Sep 17, 2009
    • Iustin Pop's avatar
      Add an error-simulation mode to cluster verify · a0c9776a
      Iustin Pop authored
      
      One of the issues we have in ganeti is that it's very hard to test the
      error-handling paths; QA and burnin only test the OK code-path, since
      it's hard to simulate errors.
      
      LUVerifyCluster is special amongst the LUs in the fact that a) it has a
      lot of error paths and b) the error paths only log the error, they don't
      do any rollback or other similar actions. Thus, it's enough for this LU
      to separate the testing of the error condition from the logging of the
      error condition.
      
      This patch does this by replacing code blocks of the form:
      
        if x:
          log_error()
          [y]
      
      into:
      
        log_error_if(x)
        [if x:
          y
        ]
      
      After this change, it's simple enough to turn on logging of all errors
      by adding a special case inside log_error_if such that if the incoming
      opcode has a special ‘debug_simulate_errors’ attribute and it's true, it
      will log unconditionally the error.
      
      Surprisingly this also turns into an absolute code reduction, since some
      of the if blocks were simplified. The only downside to this patch is
      that the various _VerifyX() functions are now stateful (modifying an
      attribute on the LU instance) instead of returning a boolean result.
      
      Last note: yes, this discovered some error cases in the logging.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a0c9776a
    • Iustin Pop's avatar
      Introduce parseable error codes in LUVerifyCluster · 7c874ee1
      Iustin Pop authored
      
      Currently the output of cluster verify can be parsed for 'ERROR'
      messages, but that is the only indication we get (error or no error). In
      order to allow monitoring tools to separate different error conditions,
      this patch introduces a new output format (“gnt-cluster verify
      --error-codes”) that changes the output from human-friendly to
      machine-friendly. In this mode, an error line changes from:
        ERROR: node node1: drbd minor 1 of instance inst1.is not active
      
      to:
        ERROR:ENODEDRBD:node:node1:drbd minor 1 of instance inst1 is not active
      
      i.e. the error message is a ‘:’-separated field, with ERROR in the first
      place, the error code in the second, the object type (cluster, node,
      instance) in the third, the name of the object (for nodes/instances) in
      the fourth, and then the text message.
      
      The patch also removes some of the verbosity of the operation
      (“Verifying instance X”, “Verifying node X”) since on big clusters these
      informational messages can quickly fill up an entire screen. The
      original behaviour can be restored via the ‘--verbose’ option.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      7c874ee1
  6. Aug 24, 2009
  7. Aug 17, 2009
  8. Aug 14, 2009
  9. Aug 10, 2009
  10. Aug 04, 2009
  11. Aug 03, 2009
  12. Jul 31, 2009
  13. Jul 22, 2009
  14. Jul 17, 2009
  15. Jun 19, 2009
  16. Jun 08, 2009
  17. May 27, 2009
    • Iustin Pop's avatar
      Add a node powercycle command · f5118ade
      Iustin Pop authored
      
      This (somewhat big) patch adds support for remotely rebooting the nodes
      via whatever support the hypervisor has for such a concept.
      
      For KVM/fake (and containers in the future) this just uses sysrq plus a
      ‘reboot’ call if the sysrq method failed. For Xen, it first tries the
      above, and then Xen-hypervisor reboot (we first try sysrq since that
      just requires opening a file handle, whereas xen reboot means launching
      an external utility).
      
      The user interface is:
      
          # gnt-node powercycle node5
          Are you sure you want to hard powercycle node node5?
          y/[n]/?: y
          Reboot scheduled in 5 seconds
      
      The node reboots hopefully after sending the reply. In case the clock is
      broken, “time.sleep(5)” might take ages (but then I suspect SSL
      negotiation wouldn't work).
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      f5118ade
  18. May 19, 2009
    • Iustin Pop's avatar
      Add -H/-B startup parameters to gnt-instance · d04aaa2f
      Iustin Pop authored
      
      This patch modifies the start instance script, opcode and logical unit
      to support temporary startup parameters.
      
      Different from 1.2, where only the kernel arguments were supporting
      changes (and thus xen-pvm specific), this version supports changing all
      hypervisor and backend parameters (with appropriate checks).
      
      This is much more flexible, and allows for example:
        - start with different, temporary kernel
        - start with different memory size
      
      Note: in later versions, this should be extended to cover disk
      parameters as well (e.g. start with drbd without flushes, start with
      drbd in async mode, etc.).
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      d04aaa2f
  19. Feb 24, 2009
    • Iustin Pop's avatar
      Remove the extra_args parameter in instance start · 07813a9e
      Iustin Pop authored
      This patch removes the extra_args parameter and instead switches the
      instance to the HV_KERNEL_ARGS hypervisor option.
      
      This is a big change, but it's a needed cleanup, this extra parameter on
      all RPC calls is not generic and we also need to have a persistent value
      here.
      
      Reviewed-by: imsnah
      07813a9e
  20. Feb 10, 2009
  21. Feb 06, 2009
    • Iustin Pop's avatar
      Fix rapi job listing · ee69c97f
      Iustin Pop authored
      This patch fixes a couple of issues with the job listing:
        - in case of a non-existing job, nicely raise 404 instead of 500
        - in the job detail listing, also list the job log, the job
          timestamps, etc.
        - the opcode migrate instance was missing its description field
      
      Reviewed-by: imsnah
      ee69c97f
  22. Feb 04, 2009
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  23. Jan 20, 2009
  24. Jan 13, 2009
    • Iustin Pop's avatar
      Forward port the live migration from 1.2 branch · 53c776b5
      Iustin Pop authored
      This is forward port via copy (and not individual patches cherry-pick)
      of the latest code on the 1.2 branch related to the migration.
      
      The changes compared to 1.2 are the fact that we don't need the
      IdentifyDisks step anymore (the drbd rpc calls are independent now), and
      the rpc module improvements.
      
      Reviewed-by: ultrotter
      53c776b5
  25. Jan 12, 2009
  26. Dec 08, 2008
    • Iustin Pop's avatar
      gnt-node modify: add the offline attribute · 3a5ba66a
      Iustin Pop authored
      This patch changes gnt-node modify and the associated opcode/lu to allow
      modification of the node offline attribute.
      
      Setting a node into offline mode automatically demotes it from the
      master role.
      
      Reviewed-by: ultrotter
      3a5ba66a
  27. Dec 02, 2008
    • Iustin Pop's avatar
      Add cluster candidate pool size parameter · 4b7735f9
      Iustin Pop authored
      This patch adds a new cluster paramater "candidate_pool_size" which
      tracks the desired size of the list of nodes with the master_candidate
      flag set.
      
      Reviewed-by: imsnah
      4b7735f9
    • Iustin Pop's avatar
      Add a gnt-node modify operation · b31c8676
      Iustin Pop authored
      This patch adds the OpCode, LogicalUnit and gnt-node command for
      modifying node parameters, more specifically the master candidate flag
      for a node.
      
      Reviewed-by: imsnah
      b31c8676
  28. Nov 25, 2008
    • Iustin Pop's avatar
      Implement support for multi devices changes · 24991749
      Iustin Pop authored
      This big patch adds support for:
        - changing NIC/disks in the multi-device model
        - adding/removing NICs
        - adding/removing disks
      
      The patch is big and not very nice; the error checking paths are not
      very clear.
      
      The biggest problem is that from a simple instance.ATTR=VAL change
      (which didn't throw errors before) now we are creating and removing
      disks in this LU.
      
      Reviewed-by: imsnah
      24991749
  29. Nov 24, 2008
  30. Nov 20, 2008
    • Iustin Pop's avatar
      Initial multi-disk/multi-nic support · 08db7c5c
      Iustin Pop authored
      This patch adds support for mult-disk/multi-nic in:
        - instance add
        - burnin
      
      The start/stop/failover/cluster verify work as expected. Replace disk
      and grow disk are TODO.
      
      There's also a change gnt-job to allow dictionaries to be listed in
      gnt-job info.
      
      Reviewed-by: imsnah
      08db7c5c
  31. Oct 16, 2008
    • Iustin Pop's avatar
      Enable gnt-cluster modify to hv/beparams · 779c15bb
      Iustin Pop authored
      This patch enables the cluster modify to change:
        - enabled hypervisor list
        - hvparams (per hypervisor)
        - beparams (only the default group)
      
      Syntax:
        gnt-cluster modify -B vcpus=3 -H xen-pvm:no_initrd_path
      
      Validation for parameters is somewhat missing - the individual
      hypervisors will be checked for syntax and validation, but beparams
      doesn't have validation yes (nowhere), it should be added here once we
      have a global method (will come soon).
      
      Reviewed-by: imsnah
      779c15bb
  32. Oct 14, 2008
    • Iustin Pop's avatar
      grow-disk: wait until resync is completed · 6605411d
      Iustin Pop authored
      The patch adds a new ‘--no-wait-for-sync’ parameter to grow-disk similar
      to the one in instance add, and changes the default to wait.
      
      This is cleaner as at the moment when the command returns, we either
      have a fully synced disk or there is an error.
      
      This is a forward-port of rev 1183 on the 1.2 branch.
      
      Reviewed-by: ultrotter
      6605411d
    • Iustin Pop's avatar
      Change over to beparams · 338e51e8
      Iustin Pop authored
      This big patch changes the master code to use the beparams. Errors might
      have crept in, but it passes a small burnin.
      
      Reviewed-by: ultrotter
      338e51e8
    • Iustin Pop's avatar
      Allow instance info to only query the config file · 57821cac
      Iustin Pop authored
      This patch adds a new '-s' parameter to ‘gnt-instance info’ that makes
      it return only 'static' information. This is much faster, especially for
      drbd instances.
      
      This is a forward-port of rev 1570 on the ganeti-1.2 branch, resending
      due to some conflicts.
      
      Reviewed-by: imsnah
      57821cac
Loading