1. 24 Apr, 2009 5 commits
    • Guido Trotter's avatar
      Update gnt-instance(8) for info · d09ebf6f
      Guido Trotter authored
      Add the --all argument, and reword a bit the basic information.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      gnt-instance info --all · 220cde0b
      Guido Trotter authored
      Don't show all instances info by default, but require --all to be passed
      for this time consuming operation.
      Reviewed-by: iustinp
    • Iustin Pop's avatar
      LUDiagnoseOS: change locking and error handling · a6ab004b
      Iustin Pop authored
      Since the “list OSes” call is exported via RAPI, this can be used pretty
      easily to DOS the master daemon during long jobs.
      The implementation of LUDiagnoseOS makes an RPC call to all nodes; we
      lock nodes here in order to prevent node removal.
      However, after closer examination, the worst case is:
        - we get the list of nodes from the config
        - another thread removes a node
        - our RPC queries reach the removed node
      As this point, if ganeti-noded is stopped or doesn't accept our queries,
      the RPC call will return failed, and in the current implementation all
      OSes will become invalid.
      If we change the ‘failed RPC’ handling to ignore such nodes, this allows
      us to both remove locking, and to handle transient RPC failures better
      (not invalidating all OSes).
      This patch does both these things, with a single drawback: in gnt-os
      diagnose, the down nodes do not appear at all. I think this is a small
      drawback, and the alternative is to add them with status failed; this
      works (3-line patch), but then the output of “list” and “diagnose” will
      no longer be consistent. As such, my proposal is to not list the nodes.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Fix verify-disks with broken volume groups · ea9ddc07
      Iustin Pop authored
      When a remote node returns invalid LVM data, we check it, but we don't
      stop and continue with the rest of the checks (which require a valid
      volume group). This raises an internal error and breaks verify disks.
      This seems unchanged for a long while, I don't know why it surfaced just
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Prevent errors when xenvg is broken cluster verify · 9a198532
      Iustin Pop authored
      When vg_name is not returned at all, we currently abort with an internal
      error. This is because we don't catch KeyError.
      This patch adds a custom message for this case, and also adds KeyError
      to the list of catched exceptions, just for safety.
      On the other hand, we could also just remove this piece of code since
      it's not used at all the ["dfree"] value.
      Reviewed-by: ultrotter
  2. 15 Apr, 2009 1 commit
    • Iustin Pop's avatar
      A bunch of doc and other small fixes · 949bdabe
      Iustin Pop authored
      This patch adds a couple of both externally and internally reported
        - missing SGML tags (Issue 54), report and patch by superdupont
        - wrong variable used in the init.d script, report and patch by
          Karsten Keil <karsten-keil@t-online.de>
        - man page for gnt-instance reinstall needs clarification (Issue 56)
        - gnt-instance man page missing --disks documentation for
        - gnt-node modify help output is unclear about the -C/-D/-O input
          format, and the man page doesn't document this command at all
        - “gnt-node modify -C yes” for offline or drained nodes had wrong
          error message
        - “gnt-instance reinstall --select-os” has wrong prompt, we only
          accept a number for the OS and not the template name
      Reviewed-by: ultrotter
  3. 14 Apr, 2009 1 commit
  4. 08 Apr, 2009 1 commit
    • Iustin Pop's avatar
      Release 2.0rc3 · 5bbefdec
      Iustin Pop authored
      Burnin tests were successful, release rc3.
      Reviewed-by: imsnah
  5. 07 Apr, 2009 1 commit
    • Iustin Pop's avatar
      Distribute built documentation · 2ab2b9f5
      Iustin Pop authored
      This patch changes the way documentation is built in order to distribute
      the generated output in the 'dist' archive, and thus no longer
      requiring the presence of the docbook/rst toolchains during build time.
      This will lower the requirements for installation and also makes the
      build time insignificant.
      First, we remove the docbook2pdf rules and variables, since we no longer
      build this kind of docs. Furthermore, the rst source files are not
      (today) processed via replace_vars_sed, so the whole .in rules for doc/
      go away.
      Next, we change the ".sgml|.rst -> replace_vars_sed -> .in -> processor
      -> final file" processing to ".sgml|.rst -> generator -> .in ->
      replace_vars_sed -> final file"; this means we first process the file
      using the formatter, with the @VARIABLE@ entries in it, and save the
      output as .in; this output we distribute, and on the user side, the
      replace_vars_sed will use the new configure flags to transform the
      (almost final .in form) to the final form, without needing the
      In configure.ac we also change from ERROR to WARN for the documentation
      generators, and extra tests in Makefile.am check that the programs have
      been found.
      This was tested with distcheck and works as expected.
      Reviewed-by: ultrotter
  6. 06 Apr, 2009 6 commits
    • Iustin Pop's avatar
      Disable synchronous (locking) queries · 77921a95
      Iustin Pop authored
      This patch raises an error in the master daemon in case the user
      requests a locking query; accordingly, all clients were modified to send
      only lockless queries. This is short-term fix, for proper fix the
      clients should be modified to submit a job when the user request a
      locking query.
      The other approach would be to ignore the flag passed by the client;
      this would be worse as client's wouldn't get at least an error.
      The possible impact of this is multiple:
        - some commands could have been not converted, and thus fail; this
          can be remedied easily
        - the consistency of commands is lost; e.g. node failover will not
          lock the node *while we get the node info*, so we could miss some
          data; this is again in the thread of atomic operations which are
          missing in the current model of query-and-act from gnt-* scripts
      Reviewed-by: imsnah, ultrotter
    • Iustin Pop's avatar
      Fix the output of watcher on non-master nodes · 2c404217
      Iustin Pop authored
      Currently the watcher spews errors message on non-master nodes. This
      cleans it up.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Change the watcher to use jobs instead of queries · 6dfcc47b
      Iustin Pop authored
      As per the mailing list discussion, this patch changes the watcher to
      use a single job (two opcodes) for getting the cluster state (node list
      and instance list); it will then compute the needed actions based on
      this data.
      The patch also archives this job and the verify-disks job.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Fix Xen soft reboot via polling · 7dd106d3
      Iustin Pop authored
      This patch fixes the Xen soft reboot ("xm reboot") via polling for a specific
      time for either changed domain ID or decreased CPU run-time.
      This sould prevent the race-conditions discussed on the mailing list for
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Add a new ssconf file with the cluster tags · 5d60b3bd
      Iustin Pop authored
      Since the cluster tags are/should be more-or-less static, add them as an
      ssconf key, so that querying them is possible without creating a
      job/requiring the masterd to be running.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Add some more debugging info to masterd · e566ddbd
      Iustin Pop authored
      This patch will log data about queries, which are today completely
      invisible (at the default log level) in the master log file.
      Reviewed-by: imsnah
  7. 27 Mar, 2009 1 commit
    • Iustin Pop's avatar
      Release 2.0rc2 · f06d91f2
      Iustin Pop authored
      This updates the NEWS file and bumps up the version number.
      Reviewed-by: ultrotter
  8. 20 Mar, 2009 3 commits
  9. 12 Mar, 2009 3 commits
  10. 10 Mar, 2009 1 commit
  11. 09 Mar, 2009 3 commits
    • Iustin Pop's avatar
      watcher: fix startup sequence locking the master · cc962d58
      Iustin Pop authored
      Currently, the watcher startup sequence does:
        - open a luxi client
        - get the instance list
        - get the node boot ids
        - open and lock the status file, and:
          - archive jobs
          - restart the down instances
          - check disks
      This, of course, can lead to problems when a node is (genuinely or not)
      locked for more than (watcher interval * maximum query clients) time. At
      that time, the master is completely unresponsive until the node is
      unlocked and all the watchers exit with error due to the state file
      being locked by the first instance.
      This patch reworks the startup sequence to first open/lock the status
      file, and only then open a luxi client. This should prevent the above
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Handle ghost instances in temp DRBD map · c614e5fb
      Iustin Pop authored
      Currently cluster-verify doesn't handle the (admitedly invalid) case where we
      have reservation for instances that were removed in the meantime.
      This patch adds a check for this and prevents code errors in cluster-verify in
      this case:
       * Verifying node node4.example.com (master candidate)
         - ERROR: ghost instance \'instance3.example.com\' in temporary DRBD map
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Fix error handling in replace-disks with new node · 82759cb1
      Iustin Pop authored
      Currently the _CreateSingleBlockDev function only raises OpExecError and not
      BlockDeviceError. This means that we don't release the instance's temporary
      minors properly, and this creates problems later if the instance is removed
      without master restart.
      We could just use OpExecError, but adding it and leaving
      BlockDeviceError in seems safer.
      Reviewed-by: imsnah
  12. 06 Mar, 2009 1 commit
    • Iustin Pop's avatar
      Fix serial_no field on instances · 6f285030
      Iustin Pop authored
      The instance objects did not get a serial_no field. This patch adds a
      new constants for the field name and uses it for all three cases
      (cluster, nodes, instances).
      Reviewed-by: imsnah
  13. 05 Mar, 2009 1 commit
    • Guido Trotter's avatar
      Update gnt-cluster(8) for be/hyp parameter syntax · 555918b3
      Guido Trotter authored
      Now it displays:
      --hypervisor-parameters hypervisor:hv-param=value [ ,hv-param=value ... ]
      --backend-parameters be-param=value [ ,be-param=value ... ]
      Sorry for the super-long lines :( Is there a better way to insert spaces
      without pushing them to the resulting man page?
      Reviewed-by: iustinp
  14. 04 Mar, 2009 3 commits
    • Iustin Pop's avatar
      Complete the cfgupgrade script for 2.0 migrations · ac4d25b6
      Iustin Pop authored
      This patch makes the cfgupgrade script to handle:
        - instance changes
        - disk changes
        - further cluster fixes
        - adds configuration checks at the end, in non-dry-run mode
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      First run at cfgupgrade for 2.0 upgrades · a421fdeb
      Iustin Pop authored
      This patch makes cfgupgrade work on empty cluster (i.e. no instances),
      up to a point that the config file can be converted from 1.2 to 2.0.
      This is not yet complete, though.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Fix bash completion for cluster copyfile/command · 75615bd3
      Iustin Pop authored
      “copyfile” takes a file argument, so we enable file-completion for it.
      “gnt-cluster command” takes a command, so we enable command completion.
      Reviewed-by: imsnah
  15. 02 Mar, 2009 6 commits
    • Iustin Pop's avatar
      Release 2.0rc1 · a2370b24
      Iustin Pop authored
      This patch updates the NEWS file and increases the version to 2.0 rc1.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Export tags to cluster verify hooks · 35e994e9
      Iustin Pop authored
      This patch export the cluster and node tags to the cluster verify hook
      scripts. The tags are exported as a space-separated list, which allows
      easy parsing from the shell (e.g. “for tag in $GANETI_CLUSTER_TAGS; do
      ...”) and therefore requires the previous “Don't allow spaces in tag
      names” patch.
      The patch also fixes a minor line length style problem.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Don't allow spaces in tag names · 28ab6fed
      Iustin Pop authored
      This patch restricts the use of spaces in tags, as this does not allow
      nice exporting of tags to environment in hooks. One can use underscores
      or dashes instead of spaces.
      Reviewed-by: schreiberal
    • Iustin Pop's avatar
      Update the iallocator documentation · 77031881
      Iustin Pop authored
      This updates the iallocator documentation to 2.0, bumps up the
      iallocator version (and moves a constants to lib/constants.py), and
      fixes a style on install.rst.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Fix a bug in utils.EnsureDirs · 1b2c8f85
      Iustin Pop authored
      This fixes a bug introduced in rev 2562 and also fixes the indentation.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      A doc update and a small indentation fix · b806661b
      Iustin Pop authored
      This adds a small paragraph about the “master” role of a node, and fixes
      a wrong indentation in the bash completion file.
      Reviewed-by: imsnah
  16. 27 Feb, 2009 3 commits
    • Guido Trotter's avatar
      Use EnsureDirs in KVM as well. · 9afb67fe
      Guido Trotter authored
      The KVM hypervisor has also code to ensure a list of directories exist.
      Substitute it with our new utils function.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Create runtime dir in bootstrap · 9dae41ad
      Guido Trotter authored
      Some hypervisors (KVM) need RUN_GANETI_DIR to exist even at cluster init
      time. This patch creates it in InitCluster just before hv parameter
      checking. Since the code to make list of directories is already repeated
      twice in the code, and this would be the third time, we abstract it into
      an utils.EnsureDirs function and we call that one from ganti-noded,
      ganeti-masterd and bootstrap.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      LUVerifyCluster: Handle the "no volume group" case · cc9e1230
      Guido Trotter authored
      If we're only file based and out volume group is set to "None" there's
      no point in asking nodes for their volume groups, logical volumes, and
      drbd devices, and checking those.
      Reviewed-by: iustinp