1. 09 Feb, 2009 10 commits
    • Iustin Pop's avatar
      Add a new instance query flag ‘disk_usage’ · 024e157f
      Iustin Pop authored
      This patch adds a new instance query flag called disk_usage that
      retrieves the overall space used by an instance on each of its nodes.
      This can be used when balancing the cluster or checking N+1 status.
      
      The flag is also exported in RAPI. Note the flag is currently broken for
      file-based instances, as it represents the amount of space in the
      cluster volume group.
      
      Reviewed-by: ultrotter
      024e157f
    • Iustin Pop's avatar
      Uniformize some function names in backend.py · 821d1bd1
      Iustin Pop authored
      Currently, the names of the functions in backend.py that are actually
      RPC procedures and are called from ganeti-noded are not corresponding to
      the RPC names. This makes it hard to actually see which functions are
      exported and which functions are internal to backend.
      
      This patch renames all blockdevice-related functions in backend.py match
      the name of the RPC call (without the ‘call’ or ‘perspective’ prefix).
      This should make it easier to grep for a given function called in
      cmdlib, without having to open and check in ganet-inoded what backend
      function it corresponds to.
      
      The patch also does two minor extra cleanups (rename a variable and
      change a logging level).
      
      Reviewed-by: ultrotter
      821d1bd1
    • Iustin Pop's avatar
      bdev: add and use two utility functions · 82463074
      Iustin Pop authored
      This patch adds two utility functions for raising BlockDeviceError
      exceptions and for running functions while ignoring this error. Most of
      the manual “raise errors.BlockDeviceError” cases are converted to
      _ThrowError, as this makes the code clearer.
      
      We also change most of the DRBD error messages to include the minor
      number because with the parallel execution of commands it's not longer
      possible to identify the failed DRBD just from the timestamp, and the
      minor number can be mapped back to the instance easier.
      
      Reviewed-by: ultrotter
      82463074
    • Iustin Pop's avatar
      rpc.call_blockdev_find: convert to (status, data) · 23829f6f
      Iustin Pop authored
      This patch converts the call_blockdev_find - which searches for block
      devices and returns their status - to the (status, data) format. We also
      modify the backend function name to match the rpc call.
      
      Reviewed-by: ultrotter
      23829f6f
    • Iustin Pop's avatar
      Export the cpu nodes and sockets from Xen · 0105bad3
      Iustin Pop authored
      This is a hand-picked forward patch of commit 1755 on the 1.2 branch
      (hand-picked since the trees diverged too much since then):
      
          The patch changed the xen hypervisor to compute the number of cpu
          sockets/nodes and enables the command line and the RAPI to show this
          information (for RAPI is enabled by default in node details, for gnt-one
          one can use the new “cnodes” and “csockets” fields).
      
          Originally-Reviewed-by: ultrotter
      
      For the KVM and fake hypervisors, the patch just exports 1 for both
      nodes and sockets. This can be fixed, by looking at the
      /sys/devices/system/cpu/cpuN/topology directories, and computing the
      actual information, but that should be done in a separate patch.
      
      Reviewed-by: imsnah
      0105bad3
    • Iustin Pop's avatar
      Fix handling OS errors in AddOSToInstance · 1268d6fd
      Iustin Pop authored
      This patch fixes the error handling in the add OS to instance function
      with regard to invalid OSes. Previously, we didn't handle any such
      errors, with the end result that the user would have to look in the node
      daemon log.
      
      The patch also renames the name of the function to match the RPC call
      name.
      
      Reviewed-by: ultrotter
      1268d6fd
    • Iustin Pop's avatar
      backend.DrbdAttachNet: don't ignore Open() errors · d3da87b8
      Iustin Pop authored
      Currently the return value or errors from the block device Open() method
      are ignored. This patch catches any BlockDeviceErrors and returns a
      well-formatted result.
      
      Reviewed-by: ultrotter
      d3da87b8
    • Iustin Pop's avatar
      cmdlib: simplify some rpc error handling cases · 0959c824
      Iustin Pop authored
      By using the RemoteFailMsg() or the payload field of RpcResult, we can
      simplify a few functions in cmdlib.
      
      Reviewed-by: ultrotter
      0959c824
    • Iustin Pop's avatar
      RpcResult: add a new payload field · f2def43a
      Iustin Pop authored
      For results which use the (status, payload) response type, it's easier
      to define a ‘payload’ field on the result holding the payload than to
      extract it using “data[1]” in the caller code.
      
      Reviewed-by: ultrotter
      f2def43a
    • Iustin Pop's avatar
      LUCreateInstance: only set running flag at the end · 4978db17
      Iustin Pop authored
      In lockless queries, it's better if we see the instance in ADMIN_down
      rather than ERROR_down during the time it's installed. As such, we
      change the LU to only mark the instance 'up' at the time we are ready to
      start it.
      
      Reviewed-by: ultrotter
      4978db17
  2. 07 Feb, 2009 4 commits
    • Guido Trotter's avatar
      KVM: don't boot from a virtio cdrom · 9dd363eb
      Guido Trotter authored
      Apparently it's not supported. Also add -boot command line parameters
      to kvm, since they seem to help booting from the right place. Everything
      will still only work when not using a kernel, but well... :)
      
      Reviewed-by: iustinp
      9dd363eb
    • Guido Trotter's avatar
      KVM: don't boot from cdrom with no cdrom · ec91c05d
      Guido Trotter authored
      Reviewed-by: iustinp
      ec91c05d
    • Guido Trotter's avatar
      Support cdrom image and boot order for KVM · 66d5dbef
      Guido Trotter authored
      The cdrom image has the same meaning than in Xen HVM, and so does
      boot_order, even though it has a slightly different syntax, and uses the
      value 'disk' too boot from disk and 'cdrom' to boot from cdrom.
      
      Reviewed-by: iustinp
      66d5dbef
    • Guido Trotter's avatar
      Get rid of constants.HT_HVM_DEFAULT_BOOT_ORDER · 30948aa6
      Guido Trotter authored
      Confusingly, as a leftober from 1.2, there was a
      constants.HT_HVM_DEFAULT_BOOT_ORDER constant, with a value opposite to
      the default HV_BOOT_ORDER hv param that got enabled only if
      HV_BOOT_ORDER was set to None. Since setting it to None is very
      hard/impossible for the user, and we didn't handle other "empty" values
      (False, ''), we'll just force the parameter to have a valid value (after
      all we have a default, and that's the way we use hvparams) and get rid
      of the old constant altoghether.
      
      Reviewed-by: iustinp
      30948aa6
  3. 06 Feb, 2009 2 commits
    • Iustin Pop's avatar
      QA: switch RAPI to https · 49b1d36e
      Iustin Pop authored
      Since we by default now use SSL for RAPI, we need to switch the QA
      tests to SSL too.
      
      Reviewed-by: amishchenko
      49b1d36e
    • Iustin Pop's avatar
      Fix rapi job listing · ee69c97f
      Iustin Pop authored
      This patch fixes a couple of issues with the job listing:
        - in case of a non-existing job, nicely raise 404 instead of 500
        - in the job detail listing, also list the job log, the job
          timestamps, etc.
        - the opcode migrate instance was missing its description field
      
      Reviewed-by: imsnah
      ee69c97f
  4. 05 Feb, 2009 6 commits
    • Iustin Pop's avatar
      rapi: fix SSL mode and use SSL by default · 2ed6a7d6
      Iustin Pop authored
      This patch fixes the SSL mode (by actually constructing SSL parameters
      from the command line options) and enables SSL by default; the old “-S”
      option which enabled SSL is now changed to “--no-ssl”. The certificate
      and key are by default pointing to the Ganeti auto-generated certificate
      for rapi.
      
      Reviewed-by: imsnah
      2ed6a7d6
    • Iustin Pop's avatar
      Small improvement to the init.d example file · e10a3aea
      Iustin Pop authored
      The start_action function is changed so that it can be called with
      arguments - this could be used to parse a defaults file, etc.
      
      Reviewed-by: imsnah
      e10a3aea
    • Guido Trotter's avatar
      KVM: add VNC TLS and X509 parameters · 8b2d1013
      Guido Trotter authored
      With this parameters VNC for KVM is able to be protected by tls,
      optionally with an x509 certificate, and optionally verifying the
      client as well. Additionally in this patch we limit the bind address to
      being a directory, rather than a file or a directory, for simplicity, as
      it allows for the same level of control anyway.
      
      Reviewed-by: iustinp
      8b2d1013
    • Guido Trotter's avatar
      KVM: allow binding vnc to a file · 8447f52b
      Guido Trotter authored
      Before we forced the VNC_BIND_ADDRESS to be an ip. Now we also accept a
      path, and bind the instance to it, or to a file in it if it's a
      directory.
      
      Reviewed-by: iustinp
      8447f52b
    • Iustin Pop's avatar
      Fix some issues for lockless queries · 2e7b8369
      Iustin Pop authored
      This patch converts some more jobs with only queries into cheaper luxi
      queries (no job created), and fixes some fallout from the lockless
      queries changes.
      
      Reviewed-by: ultrotter
      2e7b8369
    • Iustin Pop's avatar
      Revive RAPI QA tests for 2.0-style RAPI · a5b9d725
      Iustin Pop authored
      This patch fixes the RAPI QA tests to work with today's RAPI code and
      also does some other minor improvements:
        - QA: only create the cluster if so configured (‘create-cluster’ key),
          this allows running parts of the QA suite against existing clusters
        - export the “hvparams” for instances in RAPI
      
      Reviewed-by: imsnah
      a5b9d725
  5. 04 Feb, 2009 7 commits
    • Iustin Pop's avatar
      rapi: fix 'bulk' processing and add locking option · 3d103742
      Iustin Pop authored
      This patch fixes the 'bulk' parameter (before any non-empty
      specification was considered True, in conflict with the documentation,
      i.e. bulk=0 still did bulk queries).
      
      The patch also adds optional locking on the instance/node listing (does
      not have effect when we only list names).
      
      Reviewed-by: imsnah
      3d103742
    • Iustin Pop's avatar
      rapi: cleanup and update to latest 2.0 API · 9031ee8e
      Iustin Pop authored
      This patch cleans up and updates the RAPI interface:
        - queries are changes to luxi queries instead of jobs, where possible
        - since we changed the API version, we remove the old-style attributes
          (sda_size, ip, etc.) and replace them with 2.0 style
        - a small optimization in the instance and node list, don't query
          twice the names in bulk output
        - switch the instance and node lists to no locking
      
      Reviewed-by: imsnah
      9031ee8e
    • Iustin Pop's avatar
      Enable lockless node queries · bc8e4a1a
      Iustin Pop authored
      Similar to the instance list, this patch enables lockless node queris.
      “gnt-node list” accepts now the “--sync” flag which enables locking, the
      default is lockless.
      
      Reviewed-by: imsnah
      bc8e4a1a
    • Iustin Pop's avatar
      rapi: fix authentication and queries · 85414b69
      Iustin Pop authored
      For queries, we don't want to require authentication. We fix this by adding an
      override GetAuthRealm in the rapi daemon.
      
      We also fix a method name.
      
      Reviewed-by: imsnah
      85414b69
    • Iustin Pop's avatar
      Add one new luxi query: cluster info · 66baeccc
      Iustin Pop authored
      This is the last query that RAPI executes via opcodes and is purely
      static (config values only). As such, we can convert it safely to a
      query instead of job.
      
      Reviewed-by: imsnah
      66baeccc
    • Iustin Pop's avatar
      ssconf: add some more keys and some fixes · 81a49123
      Iustin Pop authored
      This patch adds the online node list and instance list to the ssconf
      keys. In order to do distribute correctly the instance list, we need to
      update the cluster serial number on instance additions and removals.
      
      The patch also changes the permissions on the ssconf files to be 0444:
        - no write for root, in order to signal that these file should not be
          modified
        - read for everyone since the files don't contain sensitive data
          anymore (and permissions can be controlled via the parent directory
          if needed)
      
      The patch also fixes a small typo on gnt-cluster.
      
      Reviewed-by: ultrotter
      81a49123
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  6. 03 Feb, 2009 11 commits
    • Guido Trotter's avatar
      KVM: Make GetAllInstancesInfo concurrency-safe · 00ad5362
      Guido Trotter authored
      Or actually more so. If this function gets called while instances get
      shut down, it might try to report information on instances which don't
      exits. Try to fail gracefully if that happens, by just skipping an
      instance which has disappeared in the meantime.
      
      Reviewed-by: iustinp
      00ad5362
    • Guido Trotter's avatar
      Correct a typo in ReadPidFile's docstring · 1de62f37
      Guido Trotter authored
      Reviewed-by: iustinp
      1de62f37
    • Iustin Pop's avatar
      Fix unittest encoding breakage · 7b80424f
      Iustin Pop authored
      Due to the fact that we sanitize now the output from environment
      scripts, the unittest needs to be adjusted. My bad for not checking it.
      
      Reviewed-by: imsnah
      7b80424f
    • Iustin Pop's avatar
      Allow gnt-node evacuate to use an iallocator · c4ed32cb
      Iustin Pop authored
      This is a partial implementation of fully automated node evacuation:
      we allow passing an iallocator and all instance replace-disks will be
      execute via that iallocator.
      
      The individual OpReplaceDisks opcodes are submitted in a single job,
      which causes them to be executed serially and thus keeps the iallocator
      runs consistent. This also changes the behaviour so that the first
      reallocation that failed will stop all the reallocations.
      
      Reviewed-by: ultrotter
      c4ed32cb
    • Iustin Pop's avatar
      Add gnt-node migrate · 40ef0ed6
      Iustin Pop authored
      This is a (modified) forward-port of commit 1190 on the 1.2 branch:
      
        This is the same as gnt-node failover, and is also a cut&paste of its
        code (almost). It will be really really useful to quickly empty a
        healthy node. I can be persuaded to merge MigrateNode and FailoverNode
        in a common codebase, but could also forget about it and submit it if
        nobody cares.
      
        Reviewed-by: iustinp
      
      The original MigrateNode function has been converted to the 2.0 style
      (cli.JobExecutor). Also commit 2076 has been added that fixes a missing
      opcode parameter.
      
      Original-author: ultrotter
      Reviewed-by: ultrotter
      40ef0ed6
    • Iustin Pop's avatar
      An attempt at fixing some encoding issues · 26f15862
      Iustin Pop authored
      This patch unifies the hardcoded re-encoding attempts into a single
      function in utils.py. This function is used to take either an unicode or
      str object and convert it to a ASCII-only str object which can be safely
      displayed and transmitted.
      
      We replace then the current manual re-encodings with this function. In
      mcpu we stop re-encoding the hooks output and instead we do it right at
      the hook generation in backend.py.
      
      This passes on my 'custom' lvs output with non-ASCII chars. But there
      are probably other places we will need to fix.
      
      Reviewed-by: ultrotter
      26f15862
    • Iustin Pop's avatar
      lvmstrap: allow removable devices too · d1687c6f
      Iustin Pop authored
      For testing or just in case a device is exported by a bad driver with
      the 'removable' flag set, this patch adds a flag to lvmstrap that allows
      it to use these devices too.
      
      Reviewed-by: ultrotter
      d1687c6f
    • Iustin Pop's avatar
      Documentation: update the gnt-os manpage · 216842d7
      Iustin Pop authored
      This patch updates the gnt-os man page and the common footer page for
      ganeti 2.0.
      
      Reviewed-by: ultrotter
      216842d7
    • Iustin Pop's avatar
      Small patch for handling errors in node add · bafc1d90
      Iustin Pop authored
      This small path hopefully fixes the handling of ssh verify errors in
      node add (note: untested).
      
      Reviewed-by: ultrotter
      bafc1d90
    • Iustin Pop's avatar
      ssh: more details on failure · a162cf5b
      Iustin Pop authored
      In case we fail without output from the ssh command, we should at least
      add the exit code or any other failure reason to the error message, and
      log it and the cmdline used to the node daemon log.
      
      Reviewed-by: imsnah
      a162cf5b
    • Guido Trotter's avatar
      Give a sane permission to the known_host file · a3f9f296
      Guido Trotter authored
      Reviewed-by: iustinp
      a3f9f296