Skip to content
Snippets Groups Projects
  1. May 21, 2009
    • Iustin Pop's avatar
      Change failover instance when instance is stopped · d27776f0
      Iustin Pop authored
      
      Currently, if the instance is stopped, we still check for enough memory
      on the target node. This is a little bit too strict, since in case too
      many nodes have failed and one is out of the memory, this prevents
      fixing the cluster (with the instances down).
      
      We change it to do the memory checks only when the instance will be
      started.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      d27776f0
    • Iustin Pop's avatar
      Export more instance information in hooks · 67fc3042
      Iustin Pop authored
      
      Currently we miss in hooks the instance's hypervisor, hypervisor
      parameters and backend parameters. This forces hooks to query back into
      ganeti, which is dangerous due to possible luxi sockets exhaustion.
      
      This patch adds these three as INSTANCE_HYPERVISOR, INSTANCE_HV_*,
      INSTANCE_BE_*. The hook environment prefixes all keys with “GANETI”, so
      a default settings for a xen-pvm instance would be:
      
        GANETI_INSTANCE_HV_initrd_path=
        GANETI_INSTANCE_HV_kernel_args=ro
        GANETI_INSTANCE_HV_kernel_path=/boot/vmlinuz-2.6-xenU
        GANETI_INSTANCE_HV_root_path=/dev/sda1
      
      Any dashes in parameter names are changed to underscores, since
      variables with dashes are not easy to access from the shell
      (alternatively we could deny those via an unittest for constants.py).
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      67fc3042
  2. May 20, 2009
  3. May 19, 2009
  4. May 18, 2009
  5. May 15, 2009
  6. May 13, 2009
  7. May 12, 2009
  8. May 11, 2009
  9. May 07, 2009
  10. May 05, 2009
  11. May 04, 2009
  12. Apr 24, 2009
    • Iustin Pop's avatar
      LUDiagnoseOS: change locking and error handling · a6ab004b
      Iustin Pop authored
      Since the “list OSes” call is exported via RAPI, this can be used pretty
      easily to DOS the master daemon during long jobs.
      
      The implementation of LUDiagnoseOS makes an RPC call to all nodes; we
      lock nodes here in order to prevent node removal.
      
      However, after closer examination, the worst case is:
        - we get the list of nodes from the config
        - another thread removes a node
        - our RPC queries reach the removed node
      
      As this point, if ganeti-noded is stopped or doesn't accept our queries,
      the RPC call will return failed, and in the current implementation all
      OSes will become invalid.
      
      If we change the ‘failed RPC’ handling to ignore such nodes, this allows
      us to both remove locking, and to handle transient RPC failures better
      (not invalidating all OSes).
      
      This patch does both these things, with a single drawback: in gnt-os
      diagnose, the down nodes do not appear at all. I think this is a small
      drawback, and the alternative is to add them with status failed; this
      works (3-line patch), but then the output of “list” and “diagnose” will
      no longer be consistent. As such, my proposal is to not list the nodes.
      
      Reviewed-by: ultrotter
      a6ab004b
    • Iustin Pop's avatar
      Fix verify-disks with broken volume groups · ea9ddc07
      Iustin Pop authored
      When a remote node returns invalid LVM data, we check it, but we don't
      stop and continue with the rest of the checks (which require a valid
      volume group). This raises an internal error and breaks verify disks.
      
      This seems unchanged for a long while, I don't know why it surfaced just
      recently.
      
      Reviewed-by: ultrotter
      ea9ddc07
    • Iustin Pop's avatar
      Prevent errors when xenvg is broken cluster verify · 9a198532
      Iustin Pop authored
      When vg_name is not returned at all, we currently abort with an internal
      error. This is because we don't catch KeyError.
      
      This patch adds a custom message for this case, and also adds KeyError
      to the list of catched exceptions, just for safety.
      
      On the other hand, we could also just remove this piece of code since
      it's not used at all the ["dfree"] value.
      
      Reviewed-by: ultrotter
      9a198532
  13. Apr 15, 2009
    • Iustin Pop's avatar
      A bunch of doc and other small fixes · 949bdabe
      Iustin Pop authored
      This patch adds a couple of both externally and internally reported
      issues:
        - missing SGML tags (Issue 54), report and patch by superdupont
        - wrong variable used in the init.d script, report and patch by
          Karsten Keil <karsten-keil@t-online.de>
        - man page for gnt-instance reinstall needs clarification (Issue 56)
        - gnt-instance man page missing --disks documentation for
          replace-disks
        - gnt-node modify help output is unclear about the -C/-D/-O input
          format, and the man page doesn't document this command at all
        - “gnt-node modify -C yes” for offline or drained nodes had wrong
          error message
        - “gnt-instance reinstall --select-os” has wrong prompt, we only
          accept a number for the OS and not the template name
      
      Reviewed-by: ultrotter
      949bdabe
  14. Apr 06, 2009
    • Iustin Pop's avatar
      Fix Xen soft reboot via polling · 7dd106d3
      Iustin Pop authored
      This patch fixes the Xen soft reboot ("xm reboot") via polling for a specific
      time for either changed domain ID or decreased CPU run-time.
      
      This sould prevent the race-conditions discussed on the mailing list for
      reboots.
      
      Reviewed-by: imsnah
      7dd106d3
    • Iustin Pop's avatar
      Add a new ssconf file with the cluster tags · 5d60b3bd
      Iustin Pop authored
      Since the cluster tags are/should be more-or-less static, add them as an
      ssconf key, so that querying them is possible without creating a
      job/requiring the masterd to be running.
      
      Reviewed-by: imsnah
      5d60b3bd
  15. Mar 20, 2009
  16. Mar 12, 2009
    • Guido Trotter's avatar
      kvm: use the correct vnc bind address · 19498d6c
      Guido Trotter authored
      There is a bug in kvm, when binding vnc to a specific address the
      constant 'vnc_bind_address' is passed in, instead of the actual
      requested address. This patch fixes it.
      
      Reviewed-by: iustinp
      19498d6c
  17. Mar 10, 2009
  18. Mar 09, 2009
    • Iustin Pop's avatar
      Handle ghost instances in temp DRBD map · c614e5fb
      Iustin Pop authored
      Currently cluster-verify doesn't handle the (admitedly invalid) case where we
      have reservation for instances that were removed in the meantime.
      
      This patch adds a check for this and prevents code errors in cluster-verify in
      this case:
       * Verifying node node4.example.com (master candidate)
         - ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map
      
      Reviewed-by: imsnah
      c614e5fb
    • Iustin Pop's avatar
      Fix error handling in replace-disks with new node · 82759cb1
      Iustin Pop authored
      Currently the _CreateSingleBlockDev function only raises OpExecError and not
      BlockDeviceError. This means that we don't release the instance's temporary
      minors properly, and this creates problems later if the instance is removed
      without master restart.
      
      We could just use OpExecError, but adding it and leaving
      BlockDeviceError in seems safer.
      
      Reviewed-by: imsnah
      82759cb1
  19. Mar 02, 2009
    • Iustin Pop's avatar
      Export tags to cluster verify hooks · 35e994e9
      Iustin Pop authored
      This patch export the cluster and node tags to the cluster verify hook
      scripts. The tags are exported as a space-separated list, which allows
      easy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do
      ...”) and therefore requires the previous “Don't allow spaces in tag
      names” patch.
      
      The patch also fixes a minor line length style problem.
      
      Reviewed-by: ultrotter
      35e994e9
Loading