1. 09 Feb, 2009 3 commits
    • Iustin Pop's avatar
      Export the cpu nodes and sockets from Xen · 0105bad3
      Iustin Pop authored
      This is a hand-picked forward patch of commit 1755 on the 1.2 branch
      (hand-picked since the trees diverged too much since then):
      
          The patch changed the xen hypervisor to compute the number of cpu
          sockets/nodes and enables the command line and the RAPI to show this
          information (for RAPI is enabled by default in node details, for gnt-one
          one can use the new “cnodes” and “csockets” fields).
      
          Originally-Reviewed-by: ultrotter
      
      For the KVM and fake hypervisors, the patch just exports 1 for both
      nodes and sockets. This can be fixed, by looking at the
      /sys/devices/system/cpu/cpuN/topology directories, and computing the
      actual information, but that should be done in a separate patch.
      
      Reviewed-by: imsnah
      0105bad3
    • Iustin Pop's avatar
      cmdlib: simplify some rpc error handling cases · 0959c824
      Iustin Pop authored
      By using the RemoteFailMsg() or the payload field of RpcResult, we can
      simplify a few functions in cmdlib.
      
      Reviewed-by: ultrotter
      0959c824
    • Iustin Pop's avatar
      LUCreateInstance: only set running flag at the end · 4978db17
      Iustin Pop authored
      In lockless queries, it's better if we see the instance in ADMIN_down
      rather than ERROR_down during the time it's installed. As such, we
      change the LU to only mark the instance 'up' at the time we are ready to
      start it.
      
      Reviewed-by: ultrotter
      4978db17
  2. 04 Feb, 2009 2 commits
    • Iustin Pop's avatar
      Enable lockless node queries · bc8e4a1a
      Iustin Pop authored
      Similar to the instance list, this patch enables lockless node queris.
      “gnt-node list” accepts now the “--sync” flag which enables locking, the
      default is lockless.
      
      Reviewed-by: imsnah
      bc8e4a1a
    • Iustin Pop's avatar
      Implement lockless query operations · ec79568d
      Iustin Pop authored
      This patch adds the framework for, and enables lockless OpQueryInstances. This
      means that instances will be shown in ERROR_up or ERROR_down state, even though
      this is not an error (but just an in-progress job).
      
      The framework is implemented as follows:
        - the OpQueryInstances, OpQueryNodes and OpQueryExports opcodes take
          an additional “use_locking” flag which will denote whether to lock
          or not; this patch only implements this for LUQueryInstances
        - the luxi query functions take an additional argument use_locking
          which is passed to the master daemon, and then passed to the above
          opcodes
        - cli.py export a new SYNC_OPT command line options which implement
          setting this flag to true
        - except for gnt-instance list, which uses this option, and for
          name-only queries (e.g. QueryNodes(fields=["names"])), all other
          callers are setting this flag to True
        - RAPI also sets the flag to True
      
      The patch was tested with a continuous (0.2s sleep in-between)
      gnt-instance list during a burnin, and no problems were observed.
      
      Reviewed-by: ultrotter
      ec79568d
  3. 03 Feb, 2009 2 commits
    • Iustin Pop's avatar
      An attempt at fixing some encoding issues · 26f15862
      Iustin Pop authored
      This patch unifies the hardcoded re-encoding attempts into a single
      function in utils.py. This function is used to take either an unicode or
      str object and convert it to a ASCII-only str object which can be safely
      displayed and transmitted.
      
      We replace then the current manual re-encodings with this function. In
      mcpu we stop re-encoding the hooks output and instead we do it right at
      the hook generation in backend.py.
      
      This passes on my 'custom' lvs output with non-ASCII chars. But there
      are probably other places we will need to fix.
      
      Reviewed-by: ultrotter
      26f15862
    • Iustin Pop's avatar
      Small patch for handling errors in node add · bafc1d90
      Iustin Pop authored
      This small path hopefully fixes the handling of ssh verify errors in
      node add (note: untested).
      
      Reviewed-by: ultrotter
      bafc1d90
  4. 02 Feb, 2009 1 commit
    • Iustin Pop's avatar
      Return error messages in node add ssh handling · a1b805fb
      Iustin Pop authored
      When the rpc call node_add fails, we don't have any error message. This
      patch changes the call to return (status, data) so that the user can see
      the correct error message.
      
      Reviewed-by: imsnah
      a1b805fb
  5. 01 Feb, 2009 1 commit
  6. 29 Jan, 2009 4 commits
    • Guido Trotter's avatar
      LUAddNode: copy the vnc password file also for KVM · 2928f08d
      Guido Trotter authored
      Before we used to copy the file if xen-hvm was enabled on the cluster,
      no we'll do that if any enabled hypervisor is in the new HTS_USE_VNC
      group.
      
      Reviewed-by: iustinp
      2928f08d
    • Guido Trotter's avatar
      GetShellCommand: get hvparams and beparams · 5431b2e4
      Guido Trotter authored
      Sometimes the hypervisor will use the instance hv and/or be parameters
      to determine the best shell command. This is not possible, though,
      currently, as the instance hv/beparams are not filled, so we have to
      pass the filled versions separately.
      
      Reviewed-by: iustinp
      5431b2e4
    • Iustin Pop's avatar
      Implement software release version checks too · e9ce0a64
      Iustin Pop authored
      Currently the LUVerifyCluster only reports the protocol version changes,
      not software ones. This is useful to know/monitor, so we add this too as
      a warning.
      
      Reviewed-by: ultrotter
      e9ce0a64
    • Iustin Pop's avatar
      LUQueryInstances: keep the given order of names · a7f5dc98
      Iustin Pop authored
      Currently LUQueryInstances keeps the ordering of instances only in some cases,
      and in others it will reorder the list. This patch fixes this by more clearly
      separating the various cases (names passed or not and locking or not locking),
      so that the output list is in the same order as always.
      
      Of course, this disables the sorting when arguments are passed.
      
      Reviewed-by: ultrotter
      a7f5dc98
  7. 28 Jan, 2009 1 commit
  8. 27 Jan, 2009 1 commit
    • Iustin Pop's avatar
      Fix the mode attribute of newly-created disks · 6ec66eae
      Iustin Pop authored
      Currently, only the LUSetInstanceParams correctly sets up the mode
      attribute via a manual operation. We remove this and instead do the
      correct setting in the generic _GenerateDiskTemplate function, so that
      we set the mode correctly for all disk creations.
      
      Reviewed-by: ultrotter
      6ec66eae
  9. 23 Jan, 2009 4 commits
    • Iustin Pop's avatar
      Fix batcher for 2.0-style disks and nics · 9939547b
      Iustin Pop authored
      This patch fixes the gnt-instance batch-create command, and in doing so
      also slightly changes two other functions:
        - we change utils.ParseUnit so that it accepts integer values also
          (both ParseUnit(5) and ParseUnit("5") return the same value)
        - a bridge 'None' in LUCreateInstance will be converted to the default
          bridge; currently only missing bridges will be accepted to mean the
          default one
      
      The main changes to batcher were the change to variable number of disks
      and NICs.
      
      The patch also adds a batcher-instances.json example file copied from
      the 1.2 branch and properly modified.
      
      Reviewed-by: imsnah, killerfoxi
      9939547b
    • Iustin Pop's avatar
      Make iallocator work with offline nodes · 1325da74
      Iustin Pop authored
      This patch changes the iallocator framework to work with and properly
      export to plugins offline nodes. It does this by only exporting the
      static configuration data for those nodes, and not attempting to parse
      the runtime data.
      
      The patch also fixes bugs in iallocator related to the RpcResult
      conversion, changes the should_run to admin_up attribute name (as per
      the internals change), and adds “-I” as a short option for
      “--iallocator” in gnt-instance, gnt-backup and burnin.
      
      Reviewed-by: ultrotter
      1325da74
    • Iustin Pop's avatar
      Remove checking of DRBD metadata for validity · 3b559640
      Iustin Pop authored
      Currently the DRBD code checks that the metadata devices are valid
      before creation, initial disk attachment and add children.
      
      However, the process for checking validity requires a free DRBD minor,
      and this conflict with parallel checking.
      
      There are at least three possible solutions:
        - serialize all checks, which means we reduce parallelism and need
          extra locks
        - don't pass a valid minor number, but one like “/dev/drbd256” (which
          is invalid); this works for current version of DRBD, but since it's
          not guaranteed to remain so it doesn't look nice
        - don't do the checking at all, and rely on “drbdsetup ... disk ...”
          to fail by itself
      
      The reason for checking metadata was that in 1.2, this was much cheaper
      than trying to activate devices (and the subsequent iteration over the
      minors). However, in 2.0, they have the same cost, so we can choose
      option 3: just remove the explicit checking and rely on drbdsetup and
      the kernel to fail.
      
      Since DRBD8._InitMeta still requires a minor number, the two places
      where this is run are handled as follows:
        - Create: we just use our own (unused currently) minor number
        - AddChildren: we keep using FindUnusedMinor, with the caveat that
          this function (used by replace-disks -n ...) cannot be yet
          parallelized
      
      Reviewed-by: ultrotter
      3b559640
    • Iustin Pop's avatar
      A couple of small fixes to iallocator · 8901997e
      Iustin Pop authored
      This removes some constraints:
        - only two disks supported, this is no longer true as the underlying
          functions can now compute size for a variable number of disks
        - error when the hypervisor was not being passed
        - typo error
      
      Reviewed-by: imsnah
      8901997e
  10. 21 Jan, 2009 3 commits
    • Iustin Pop's avatar
      Automatically release DRBD minors on success · 61cf6b5e
      Iustin Pop authored
      This patch converts the DRBD minors reservation protocol from explicit
      release to automatic release on the success paths. On the errors paths,
      it's still needed to manual release.
      
      The patch doesn't bring much by itself, but is needed for a future patch
      which enhances the automatic verification of configuration consistency.
      
      Reviewed-by: ultrotter
      61cf6b5e
    • Iustin Pop's avatar
      Change the instance status attribute to boolean · 0d68c45d
      Iustin Pop authored
      Due to historic reasons, the “should run or not” attribute of an
      instance was denoted by its “status” attribute having a string value of
      either ‘up’ or ‘down’. Checking this is in code was done via hardcoding
      of the strings.
      
      This was long done for a redo, and this patch changes this attribute to
      “admin_up” having a boolean value. The patch is in fact shorter than I
      expected, and passes burnin.
      
      The patch also fixes an error in BuildInstanceHookEnvByObject where the
      instance.os was passed as the status value.
      
      Reviewed-by: ultrotter
      0d68c45d
    • Guido Trotter's avatar
      Add calls in the intra-node migration protocol · 6906a9d8
      Guido Trotter authored
      Currently the hypervisor is expected to do all the migration from the
      source side. With this patch we also add the option of passing some
      information to the target side, and starting some operation there.
      
      As a bonus, a function to cleanup any started operation is included.
      
      Reviewed-by: iustinp
      6906a9d8
  11. 20 Jan, 2009 6 commits
    • Iustin Pop's avatar
      Convert RenameInstance to (status, data) · 96841384
      Iustin Pop authored
      This allows the rename failures to show the ouput of OS scripts.
      
      Reviewed-by: ultrotter
      96841384
    • Iustin Pop's avatar
      Fix adding of disks to an instance · 32388e6d
      Iustin Pop authored
      The ConfigWriter.AllocateDRBDMinor requires the instance name, not the
      instance object. The LUSetInstanceParms is passing wrongly the instance
      object, which can cause breakage.
      
      The patch also adds asserts to check for this mismatch in ConfigWriter.
      
      Reviewed-by: ultrotter
      32388e6d
    • Iustin Pop's avatar
      Make cluster-verify check the drbd minors space · 6d2e83d5
      Iustin Pop authored
      This patch adds support for verification of drbd minors space in cluster
      verify: minors which belong to running instances and should be online
      but are not, and minors which do not belong to any instace but are in
      use.
      
      The patch requires exposing some methods from bdev.DRBD8 and
      config.ConfigWriter which were until now private methods.
      
      Reviewed-by: ultrotter
      6d2e83d5
    • Iustin Pop's avatar
      Some small fixes in cmdlib · 1492cca7
      Iustin Pop authored
      Reviewed-by: ultrotter
      1492cca7
    • Iustin Pop's avatar
      Convert AddOSToInstance to (status, data) · 20e01edd
      Iustin Pop authored
      This allows the install and reinstall instance to return (hopefully)
      relevant log files from the OS create scripts.
      
      Reviewed-by: ultrotter
      20e01edd
    • Iustin Pop's avatar
      Convert the start instance rpc to (status, data) · dd279568
      Iustin Pop authored
      This will record the failure cause in starting up the instance in the
      job log (and thus to the user).
      
      Reviewed-by: ultrotter
      dd279568
  12. 19 Jan, 2009 7 commits
    • Iustin Pop's avatar
      Fix handling of failures in create instance disks · 7d81697f
      Iustin Pop authored
      Commit 2302 only modified _CreateBlockDevOnPrimary to the new style
      result, but _CreateBlockDevOnSecondary was forgotten. After the merger
      of the two functions, _CreateBlockDevOnSecondary was taken as template
      so we checked against old-style values, thus completely breaking error
      handling.
      
      Reviewed-by: imsnah
      7d81697f
    • Iustin Pop's avatar
      Use instance.all_nodes instead of hand-building it · 6b12959c
      Iustin Pop authored
      This patch replaces a few obvious uses of [instance.primary_node] +
      list(instance.secondary_nodes) (or similar usage) with the new
      instance.all_nodes.
      
      Reviewed-by: ultrotter
      6b12959c
    • Iustin Pop's avatar
      Split the block device creation in two parts · de12473a
      Iustin Pop authored
      Some callers of _CreateBlockDev need recursive behaviour, but not all.
      The replace secondary first creates (manually) new LVs to ensure storage
      is there, and then it creates the new DRBD. At this point, we need a
      non-recursive call so that the LVs are not needlessly re-created.
      
      This patch splits the single device creation into a separate function,
      so that LUReplaceDisks can use it.
      
      Reviewed-by: ultrotter
      de12473a
    • Iustin Pop's avatar
      Combine the two _CreateBlockDevOnXXX functions · 428958aa
      Iustin Pop authored
      Since only two boolean parameters differ between these two functions, we
      combine them as to have less code duplication. This will be needed in
      the future as we will need to split off the recursive part off.
      
      Reviewed-by: ultrotter
      428958aa
    • Iustin Pop's avatar
      Switch call_blockdev_create call to (status, data) · dab69e97
      Iustin Pop authored
      This allows errors to be visible at the user level instead of just node
      daemon logs.
      
      Reviewed-by: ultrotter
      dab69e97
    • Iustin Pop's avatar
      Small change in the instance disk creation path · 796cab27
      Iustin Pop authored
      For future propagation of error messages from backend to cmdlib and to
      the job log, just having True/False return from the disk creation
      function is not enough.
      
      This patch converts these functions (_CreateDisks, _CreateBlockDevOnXXX)
      to raise exception on errors, and otherwise the return value is None.
      
      Reviewed-by: ultrotter
      796cab27
    • Iustin Pop's avatar
      Use the same root for both _data and _meta LVs · e6c1ff2f
      Iustin Pop authored
      Currently we use a different UUID for the _data and _meta volumes of a
      DRBD disk. This is confusing as it's hard to associate the two in the
      output of “lvs” or “gnt-node volumes”.
      
      The patch changes so that they use the same prefix.
      
      Reviewed-by: ultrotter
      e6c1ff2f
  13. 16 Jan, 2009 2 commits
    • Iustin Pop's avatar
      Fix LUExportInstance · 998c712c
      Iustin Pop authored
      Due to deficiencies in our block device implementation, it is a must to
      call SetDiskID on disks before passing them to remote nodes. Since in
      export/import, we don't touch the disks themselves, this was not needed
      before in this function.
      
      However, since having instance symlinks, the correct ID is needed here
      too, and with static minors it's a "must need". This reflects into
      failed instance starts after migration and/or failover.
      
      Reviewed-by: ultrotter
      998c712c
    • Iustin Pop's avatar
      Fix gnt-backup export with short names · aeb83a2b
      Iustin Pop authored
      We need to pass the fully-qualified node to _CheckNodeOnline, not the short
      one.
      
      Reviewed-by: imsnah
      aeb83a2b
  14. 13 Jan, 2009 3 commits
    • Iustin Pop's avatar
      Forward port the live migration from 1.2 branch · 53c776b5
      Iustin Pop authored
      This is forward port via copy (and not individual patches cherry-pick)
      of the latest code on the 1.2 branch related to the migration.
      
      The changes compared to 1.2 are the fact that we don't need the
      IdentifyDisks step anymore (the drbd rpc calls are independent now), and
      the rpc module improvements.
      
      Reviewed-by: ultrotter
      53c776b5
    • Iustin Pop's avatar
      Port replace disk/change node to the new DRBD RPCs · a2d59d8b
      Iustin Pop authored
      In replace disks to new secondary, since Attach (and therefore
      call_blockdev_find) is not modifying the devices anymore, we need to
      switch this LU to the new call_drbd_disconnect_net and
      call_drbd_attach_net functions.
      
      Due to the authentication needed in 2.0, we need to be more careful with
      the activation order. In 1.2, we have the case that the new node was
      directly activated with networking information, and could connect to the
      primary while it was still connected or WFConnect to the old secondary.
      
      In the new scheme, we:
        - create the new drbd in StandAlone mode
        - shutdown old secondary (primary becomes WFConnection)
        - disconnect primary (and thus it goes into StandAlone)
        - connect both primary and new secondary to network using the
          call_drbd_attach_net rpc
      
      This should be safer, and is cleaner. This passes burnin.
      
      Reviewed-by: ultrotter
      a2d59d8b
    • Iustin Pop's avatar
      Fix modification of instance memory · ea33068f
      Iustin Pop authored
      ... as found by the QA script - bug was introduced by me in commit 2117.
      
      Reviwed-by: imsnah
      ea33068f