1. 27 Jan, 2009 9 commits
    • Guido Trotter's avatar
      Xen: use utils.WriteFile for the instance configs · 73cd67f4
      Guido Trotter authored
      Also raise HypervisorError rather than OpExecError.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Xen: use utils.Readfile to read the VNC password · 78f66a17
      Guido Trotter authored
      Also raise HypervisorError rather than OpExecError.
      Reviewed-by: iustinp
    • Iustin Pop's avatar
      Implement disk verify checks in config verify · 332d0e37
      Iustin Pop authored
      This patch adds a simple check that the 'mode' attribute of top-level disks is
      correct. It does not recurse over children.
      The framework could be extended with other checks in the future.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Fix the mode attribute of newly-created disks · 6ec66eae
      Iustin Pop authored
      Currently, only the LUSetInstanceParams correctly sets up the mode
      attribute via a manual operation. We remove this and instead do the
      correct setting in the generic _GenerateDiskTemplate function, so that
      we set the mode correctly for all disk creations.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Rework the multi-instance gnt commands · 479636a3
      Iustin Pop authored
      This patch changes the multi-instance gnt-* commands (gnt-instance
      start/stop, gnt-node evacuate/failover) such that the individual
      operations are submitted in parallel, ideally improving the speed of the
      The patch does this by abstracting the job set functionality into a new
      class in cli.py, that takes care of the job submit, job poll and error
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Fix single-job archiving (gnt-job archive) · 5278185a
      Iustin Pop authored
      This is a simply typo from the conversion to multi-job archiving.
      Reviewed-by: imsnah
    • Guido Trotter's avatar
      KVM and Xen: add the HV_ROOT_PATH parameter · 074ca009
      Guido Trotter authored
      This parameter allows a different path to be passed to the instance
      kernel. The new parameter is mandatory, and by default has the value of
      the old hardcoded value for both kvm and xen.
      Beta1 clusters will need to have this parameter added for their
      instances to be able to boot.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: implement GetShellCommandForConsole · 637ce7f9
      Guido Trotter authored
      This is a class method, because it calls _InstanceSerial, which is
      another class method. The patch changes it to classmethod for all the
      hypervisor classes.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: classify _Instance{Monitor,Serial,KVMRuntime} · 0df4d98a
      Guido Trotter authored
      Those methods need nothing from the instantiated class, and just
      manipulate strings, and fetch some class global variables, so they can
      be classmethods.
      Reviewed-by: iustinp
  2. 26 Jan, 2009 2 commits
    • Iustin Pop's avatar
      Release 2.0 beta 1 · e33a0080
      Iustin Pop authored
      Even though alpha started at 0, we release beta 1 first as we did for
      Reviewed-by: imsnah, ultrotter
    • Iustin Pop's avatar
      Update the NEWS documents for beta1 · 10f31783
      Iustin Pop authored
      Also import the NEWS entries from the 1.2 branch which were added since
      we created it.
      Reviewed-by: ultrotter
  3. 23 Jan, 2009 10 commits
    • Guido Trotter's avatar
      Xen and KVM: correct a typo when checking args · 50cb2e2a
      Guido Trotter authored
      A missing 'be' was present in the error string for both xen and kvm,
      when the kernel or initrd path was not absolute.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Sort the instance names in batcher · 7312b33d
      Iustin Pop authored
      In case we submit multiple instances via batcher, it's nicer to have the
      sorted nicely.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Fix batcher for 2.0-style disks and nics · 9939547b
      Iustin Pop authored
      This patch fixes the gnt-instance batch-create command, and in doing so
      also slightly changes two other functions:
        - we change utils.ParseUnit so that it accepts integer values also
          (both ParseUnit(5) and ParseUnit("5") return the same value)
        - a bridge 'None' in LUCreateInstance will be converted to the default
          bridge; currently only missing bridges will be accepted to mean the
          default one
      The main changes to batcher were the change to variable number of disks
      and NICs.
      The patch also adds a batcher-instances.json example file copied from
      the 1.2 branch and properly modified.
      Reviewed-by: imsnah, killerfoxi
    • Iustin Pop's avatar
      Make iallocator work with offline nodes · 1325da74
      Iustin Pop authored
      This patch changes the iallocator framework to work with and properly
      export to plugins offline nodes. It does this by only exporting the
      static configuration data for those nodes, and not attempting to parse
      the runtime data.
      The patch also fixes bugs in iallocator related to the RpcResult
      conversion, changes the should_run to admin_up attribute name (as per
      the internals change), and adds “-I” as a short option for
      “--iallocator” in gnt-instance, gnt-backup and burnin.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Remove checking of DRBD metadata for validity · 3b559640
      Iustin Pop authored
      Currently the DRBD code checks that the metadata devices are valid
      before creation, initial disk attachment and add children.
      However, the process for checking validity requires a free DRBD minor,
      and this conflict with parallel checking.
      There are at least three possible solutions:
        - serialize all checks, which means we reduce parallelism and need
          extra locks
        - don't pass a valid minor number, but one like “/dev/drbd256” (which
          is invalid); this works for current version of DRBD, but since it's
          not guaranteed to remain so it doesn't look nice
        - don't do the checking at all, and rely on “drbdsetup ... disk ...”
          to fail by itself
      The reason for checking metadata was that in 1.2, this was much cheaper
      than trying to activate devices (and the subsequent iteration over the
      minors). However, in 2.0, they have the same cost, so we can choose
      option 3: just remove the explicit checking and rely on drbdsetup and
      the kernel to fail.
      Since DRBD8._InitMeta still requires a minor number, the two places
      where this is run are handled as follows:
        - Create: we just use our own (unused currently) minor number
        - AddChildren: we keep using FindUnusedMinor, with the caveat that
          this function (used by replace-disks -n ...) cannot be yet
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Rework the execution model in burnin · c723c163
      Iustin Pop authored
      This patch changes (significantly) the execution model in burnin:
        - for all runs, (almost) all instance mods in a single Burn* procedure
          are done as part of a job; so for example add disk, stop, remove
          disk, start are no longer done as separate jobs but as a single job
          consisting of four opcodes
        - for parallel runs, all Burn* procedures except the rename (which
          uses a single target name) run in parallel; before, only the
          creation was done in parallel
        - due to the single-job execution and also parallel execution, the
          logging messages are no longer happening synchronously with the
          execution, so they are more informative than an actual execution log
      The end result is that burnin now tests properly multi-opcode jobs and
      also tests all opcodes (except rename) for parallel execution.
      Note: On a test cluster, parallelization reduces burnin time from 23m to
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Relax the restrictions on temporary DRBD minors · 79b26a7a
      Iustin Pop authored
      Currently the restrictions are too harsh: there is a time interval
      between an instance gets a new disk and before it is added to the
      configuration in which the restriction is not met. We solve this by
      allowing temporary DRBD minors to match existing minors (for the same
      instance), such that parallel creations/minor allocations are OK.
      The change is done by moving the add of temporary minors to the
      minor map after the instance minors are computed, and only considering
      them as duplicate if the instance name doesn't match.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Introduce more configuration consistency checks · 4a89c54a
      Iustin Pop authored
      This patch enhances the duplicate DRBD minors checks (currently just a
      few) and adds automatic checks of configuration consistency at
      configuration file writing time.
      In order to do so and show meaningful error messages, the
      _UnlockedComputeDRBDMap function is changed to not raise errors in case
      of duplicates, but instead return both the minors map and the duplicate
      list, and its callers now raise the error. This allows the VerifyConfig
      function to return a complete list of duplicates.
      The new checks required some small updates to the unittests for the
      config module.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Fill the 'call' attribute of offline rpc results · 84b45587
      Iustin Pop authored
      When creating ‘fake’ results for offline nodes, we currently don't pass
      the call attribute. This complicates debugging, so even though this
      should not matter in practice, it's better to fix it.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      A couple of small fixes to iallocator · 8901997e
      Iustin Pop authored
      This removes some constraints:
        - only two disks supported, this is no longer true as the underlying
          functions can now compute size for a variable number of disks
        - error when the hypervisor was not being passed
        - typo error
      Reviewed-by: imsnah
  4. 22 Jan, 2009 1 commit
    • Iustin Pop's avatar
      luxi: close and reopen the socket on errors · 8d5b316c
      Iustin Pop authored
      This is less of an actual issue for regular gnt-* clients, but it's
      easily reproducible with burnin and possible with RAPI (depending on how
      the program uses luxi.Client(s)).
      In case of burnin, if we interrupt the client (^C) while it polls the
      job, it will abort and raise an error. After that, burnin issues a
      remove instance job, and at this point, we send the submit job (remove)
      call but the first thing we read from the socket will be the response to
      the previous poll job request, since that was queued already from the
      To solve this, whenever we detect an error in Transport.Call(), we close
      that transport and re-create a new one, to start anew. The other
      alternative would be to introduce a sequence to the protocol, but this
      is something that would be design-level change and it's not recommended
      at this stage.
      Reviewed-by: imsnah
  5. 21 Jan, 2009 18 commits
    • Guido Trotter's avatar
      ShutdownInstance: log instance name, not object · ca77edbc
      Guido Trotter authored
      When an instance fails to shut down we currently log its whole object,
      rather than just the instance name.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM live migration: handle failure · c087266c
      Guido Trotter authored
      If the KVM live migration ends up in a 'failed' state it has been
      aborted at the kvm level, and the machine is still running locally.
      We support also the 'cancelled' state even though there should be no way
      of reaching it, without manual intervention.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: change a few IOError with EnvironmentError · 90c024f6
      Guido Trotter authored
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: instance migration · 30e42c4e
      Guido Trotter authored
      The tcp port used for migrating KVM instances is selectable at
      ./configure time. We use a single port as nodes are locked anyway during
      a migration, so no two migrations can happen at the same time to the
      same node.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: add the _InstancePidAlive function · 1f8b3a27
      Guido Trotter authored
      Throughout the kvm code we very often look for the instance pidfile
      name, read it, and check if the process is alive. Abstract this into a
      private function and use that one instead.
      This patch also changes RebootInstance to check whether the instance is
      alive before trying to reboot it.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: fix RebootInstance · f02881e0
      Guido Trotter authored
      RebootInstance was broken, because it just used to call StartInstance
      with wrong parameters. With this patch we still stop the instance, but
      use the saved kvm runtime to start it again.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: retry the instance shutdown command · 6567aff3
      Guido Trotter authored
      When we ask the instance to shutdown sometimes the command won't work,
      especially if the instance isn't fully booted up. We'll wait for a bit,
      and give it a few chances before giving up.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Xen: implement auxiliary migration functions · 4390ccff
      Guido Trotter authored
      These are used, for the xen hypervisor, to copy the xen config file to
      the remote node. This breaks migration for instances which have been
      migrated, but not restarted, with the old code, for which the config
      file was just lost.
      Reviewed-by: iustinp
    • Iustin Pop's avatar
      Automatically release DRBD minors on success · 61cf6b5e
      Iustin Pop authored
      This patch converts the DRBD minors reservation protocol from explicit
      release to automatic release on the success paths. On the errors paths,
      it's still needed to manual release.
      The patch doesn't bring much by itself, but is needed for a future patch
      which enhances the automatic verification of configuration consistency.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Fix some more pylint errors · c979d253
      Iustin Pop authored
      Two are real errors (invalid names) and one is style error (overriding
      name from outer scope).
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      One more gitignore rule · dc458d00
      Iustin Pop authored
      This was forgotten in the recent “switch to explicit ignore rules”.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Log the rpc call name in the RPC errors message · 1b8acf70
      Iustin Pop authored
      Currently the rpc module logs the error description and target node in
      rpc calls logging, as such:
        2009-01-21 00:50:01,456:  pid=1051/Thread-21 ERROR RPC error from node
          node1.example.com: Connection failed (111: Connection
      but this doesn't help to understand which call caused this (here it's an
      offline node which should not be contacted at all).
      This patch adds the logging of the call too, so cases like the above can
      be debugged easier.
      Reviewed-by: imsnah, ultrotter
    • Iustin Pop's avatar
      Change the instance status attribute to boolean · 0d68c45d
      Iustin Pop authored
      Due to historic reasons, the “should run or not” attribute of an
      instance was denoted by its “status” attribute having a string value of
      either ‘up’ or ‘down’. Checking this is in code was done via hardcoding
      of the strings.
      This was long done for a redo, and this patch changes this attribute to
      “admin_up” having a boolean value. The patch is in fact shorter than I
      expected, and passes burnin.
      The patch also fixes an error in BuildInstanceHookEnvByObject where the
      instance.os was passed as the status value.
      Reviewed-by: ultrotter
    • Guido Trotter's avatar
      Implement the new live migration backend functions · cd42d0ad
      Guido Trotter authored
      MigrationInfo, AcceptInstance and AbortMigration are implemented as
      hypervisor specific functions, and by default they do nothing (as
      they're not always necessary).
      This patch also converts hv_base.MigrateInstance docstring to epydoc,
      adds a missing @type to the GetInstanceInfo docstring, and removes an
      unneeded empty line.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: save and remove the KVM runtime · 38e250ba
      Guido Trotter authored
      At instance startup time we save the kvm runtime, and at stop time we
      delete it. This patch also includes a function to load the kvm runtime,
      which is unused yet.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      KVM: split KVM runtime generation and startup · ee5f20b0
      Guido Trotter authored
      Before we used to generate the kvm command line and then just run it.
      With this patch we split the generation from the time it is run,
      allowing us to save it and replay it at reboot.
      We must take special care about instance nics:
        - We can't include them in the saved command line, as they point to
          temporary files
        - We can't just generate them at exec time, because we would apply
          those changes, but not all the other ones, to a running instance,
          thus making it inconsistent (for example if an instance had a memory
          increased and one more nic, in a soft reboot we would add the nic, but
          not the memory)
      So we'll just save the instance nic data at the time the kvm runtime
      data is generated, and transform it into actual parameters at execution
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Add calls in the intra-node migration protocol · 6906a9d8
      Guido Trotter authored
      Currently the hypervisor is expected to do all the migration from the
      source side. With this patch we also add the option of passing some
      information to the target side, and starting some operation there.
      As a bonus, a function to cleanup any started operation is included.
      Reviewed-by: iustinp
    • Iustin Pop's avatar
      Update the objects.Disk formatting method · 89f28b76
      Iustin Pop authored
      With the addition of minors, this needs to show them too.
      Reviewed-by: ultrotter