1. 28 Dec, 2009 3 commits
  2. 14 Dec, 2009 1 commit
  3. 25 Sep, 2009 1 commit
    • Iustin Pop's avatar
      Fix the confusing ssh/hostname message in node add · 31821208
      Iustin Pop authored
      
      
      Before, it used to say:
      
        ssh/hostname verification failed node1.example.com -> hostname mismatch, got
        node2
      
      Now it says for wrong hostnames (maybe too verbose):
      
        ssh/hostname verification failed (checking from node1.example.com): hostname
        mismatch, expected node2.example.com but got node3
      
      And for non-FQDN hostnames:
      
        ssh/hostname verification failed (checking from node1.example.com): hostname
        not FQDN: expected node2.example.com but got node2
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      31821208
  4. 08 Sep, 2009 1 commit
  5. 31 Aug, 2009 1 commit
  6. 19 Aug, 2009 1 commit
  7. 14 Aug, 2009 1 commit
  8. 12 Aug, 2009 1 commit
  9. 11 Aug, 2009 1 commit
  10. 10 Aug, 2009 2 commits
  11. 05 Aug, 2009 1 commit
    • Iustin Pop's avatar
      export: add meaningful exit code · 084f05a5
      Iustin Pop authored
      
      
      Currently ‘gnt-backup export’ always returns exit code zero, even in the
      face of complete failure during backup (only failure to stop/start the
      instance will cause job failure and thus non-zero exit code). This is
      bad, since one cannot script the backup.
      
      This patch adds some simple results from the LU so that the command line
      script can return good exit code. It will:
        - return zero for full success (snapshot removal errors are ignored
          though)
        - return one for full failure (finalize export failure or all disks
          failure)
        - return two for partial failure (some disks backed up, some not)
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      084f05a5
  12. 04 Aug, 2009 3 commits
  13. 29 Jul, 2009 1 commit
  14. 17 Jul, 2009 2 commits
  15. 16 Jul, 2009 1 commit
  16. 14 Jul, 2009 1 commit
  17. 13 Jul, 2009 1 commit
  18. 07 Jul, 2009 2 commits
  19. 30 Jun, 2009 3 commits
    • Iustin Pop's avatar
      Cleanup config data when draining nodes · dec0d9da
      Iustin Pop authored
      
      
      Currently, when draining nodes we reset their master candidate flag, but
      we don't instruct them to demote themselves. This leads to “ERROR: file
      '/var/lib/ganeti/config.data' should not exist on non master candidates
      (and the file is outdated)”.
      
      This patch simply adds a call to node_demote_from_mc in this case.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      dec0d9da
    • Iustin Pop's avatar
      Fix node readd issues · a8ae3eb5
      Iustin Pop authored
      
      
      This patch fixes a few node readd issues.
      
      Currently, the node readd consists of two opcodes:
        - OpSetNodeParms, which resets the offline/drained flags
        - OpAddNode (with readd=True), which reconfigures the node
      
      The problem is that between these two, the configuration is inconsistent
      for certain cluster configurations. Thus, this patch removes the first
      opcode and modified the LUAddNode to deal with this case too.
      
      The patch also modifies the computation of the intended master_candidate
      status, and actually sets the readded node to master candidate if
      needed. Previously, we didn't modify the existing node at all.
      
      Finally, the patch modifies the bottom of the Exec() function for this
      LU to:
        - trigger a node update, which in turn redistributes the ssconf files
          to all nodes (and thus the new node too)
        - if the new node is not a master candidate, then call the
          node_demote_from_mc RPC so that old master files are cleared
      
      My testing shows this behaves correctly for various cases.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a8ae3eb5
    • Iustin Pop's avatar
      Fix error message for extra files on non MC nodes · e631cb25
      Iustin Pop authored
      
      
      Currently the message for extraneous files on non master candidates is
      confusing, to say the least. This makes it hopefully more clear.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      e631cb25
  20. 29 Jun, 2009 2 commits
  21. 17 Jun, 2009 3 commits
    • Iustin Pop's avatar
      Fix handling of 'vcpus' in instance list · c1ce76bb
      Iustin Pop authored
      
      
      Currently running “gnt-instance list -o+vcpus” fails with a cryptic message:
        Unhandled Ganeti error: vcpus
      
      This is due to multiple issues:
        - in some corner cases cmdlib.py raises an errors.ParameterError but
          this is not handled by cli.py
        - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle
          it, so instead of failing with unknown parameter, e.g.:
            Failure: prerequisites not met for this operation:
            Unknown output fields selected: vcpuscd
          it raises the ParameteError message
      
      This patch:
        - adds handling of 'vcpus' to LUQueryInstances
        - adds handling of the ParameterError exception to cli.py
        - changes the 'else: raise errors.ParameterError' in the field handling of
          LUQueryInstance to an assert, since it's a programmer error if we reached
          this step
      
      With this, a future unhandled parameter will show:
        gnt-instance list -o+vcpus
        Unhandled protocol error while talking to the master daemon:
        Caught exception: Declared but unhandled parameter 'vcpus'
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      c1ce76bb
    • Iustin Pop's avatar
      Fix checking for valid OS in instance create · 6dfad215
      Iustin Pop authored
      
      
      The current check in LUCreateInstance.CheckPrereq() is wrong - it only checks
      if we got an OS, but not if we got a valid OS. This patch fixes it.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      6dfad215
    • Iustin Pop's avatar
      Show disk size in instance info · c98162a7
      Iustin Pop authored
      
      
      The size of the instance's disk was not shown in “gnt-instance info”.
      This patch adds it and formats it nicely if possible.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      c98162a7
  22. 16 Jun, 2009 1 commit
  23. 04 Jun, 2009 1 commit
    • Iustin Pop's avatar
      Wait for a while in failed resyncs · fbafd7a8
      Iustin Pop authored
      
      
      This patch is an attempt at fixing some very rare occurrences of messages like:
        - "There are some degraded disks for this instance", or:
        - "Cannot resync disks on node node3.example.com: [True, 100]"
      
      What I believe happens is that drbd has finished syncing, but not all
      fields are updated in 'Connected' state; maybe it's in WFBitmap[ST], or
      in some other transient state we don't handle well.
      
      The patch will change the _WaitForSync method to recheck up to a
      hardcoded number of times if we're finished syncing but we're degraded
      (using the same condition as the 'break' clause of the loop).
      
      The cons of this changes is that a normal, really-degraded due to
      network or disk failure will cause an extra delay before it aborts. For
      this, I'm happy to choose other values.
      
      A better, long term fix is to handle more DRBD state correctly (see the
      bdev.DRBD8Status class).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      fbafd7a8
  24. 03 Jun, 2009 1 commit
  25. 28 May, 2009 1 commit
  26. 21 May, 2009 2 commits
    • Iustin Pop's avatar
      Change failover instance when instance is stopped · d27776f0
      Iustin Pop authored
      
      
      Currently, if the instance is stopped, we still check for enough memory
      on the target node. This is a little bit too strict, since in case too
      many nodes have failed and one is out of the memory, this prevents
      fixing the cluster (with the instances down).
      
      We change it to do the memory checks only when the instance will be
      started.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      d27776f0
    • Iustin Pop's avatar
      Export more instance information in hooks · 67fc3042
      Iustin Pop authored
      
      
      Currently we miss in hooks the instance's hypervisor, hypervisor
      parameters and backend parameters. This forces hooks to query back into
      ganeti, which is dangerous due to possible luxi sockets exhaustion.
      
      This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*,
      INSTANCE_BE_*. The hook environment prefixes all keys with “GANETI”, so
      a default settings for a xen-pvm instance would be:
      
        GANETI_INSTANCE_HV_initrd_path=
        GANETI_INSTANCE_HV_kernel_args=ro
        GANETI_INSTANCE_HV_kernel_path=/boot/vmlinuz-2.6-xenU
        GANETI_INSTANCE_HV_root_path=/dev/sda1
      
      Any dashes in parameter names are changed to underscores, since
      variables with dashes are not easy to access from the shell
      (alternatively we could deny those via an unittest for constants.py).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      67fc3042
  27. 19 May, 2009 1 commit