Skip to content
Snippets Groups Projects
  1. Jan 29, 2009
  2. Jan 28, 2009
  3. Jan 27, 2009
    • Guido Trotter's avatar
      Xen: use utils.WriteFile for the instance configs · 73cd67f4
      Guido Trotter authored
      Also raise HypervisorError rather than OpExecError.
      
      Reviewed-by: iustinp
      73cd67f4
    • Guido Trotter's avatar
      Xen: use utils.Readfile to read the VNC password · 78f66a17
      Guido Trotter authored
      Also raise HypervisorError rather than OpExecError.
      
      Reviewed-by: iustinp
      78f66a17
    • Iustin Pop's avatar
      Implement disk verify checks in config verify · 332d0e37
      Iustin Pop authored
      This patch adds a simple check that the 'mode' attribute of top-level disks is
      correct. It does not recurse over children.
      
      The framework could be extended with other checks in the future.
      
      Reviewed-by: imsnah
      332d0e37
    • Iustin Pop's avatar
      Fix the mode attribute of newly-created disks · 6ec66eae
      Iustin Pop authored
      Currently, only the LUSetInstanceParams correctly sets up the mode
      attribute via a manual operation. We remove this and instead do the
      correct setting in the generic _GenerateDiskTemplate function, so that
      we set the mode correctly for all disk creations.
      
      Reviewed-by: ultrotter
      6ec66eae
    • Iustin Pop's avatar
      Rework the multi-instance gnt commands · 479636a3
      Iustin Pop authored
      This patch changes the multi-instance gnt-* commands (gnt-instance
      start/stop, gnt-node evacuate/failover) such that the individual
      operations are submitted in parallel, ideally improving the speed of the
      execution.
      
      The patch does this by abstracting the job set functionality into a new
      class in cli.py, that takes care of the job submit, job poll and error
      handling.
      
      Reviewed-by: ultrotter
      479636a3
    • Iustin Pop's avatar
      Fix single-job archiving (gnt-job archive) · 5278185a
      Iustin Pop authored
      This is a simply typo from the conversion to multi-job archiving.
      
      Reviewed-by: imsnah
      5278185a
    • Guido Trotter's avatar
      KVM and Xen: add the HV_ROOT_PATH parameter · 074ca009
      Guido Trotter authored
      This parameter allows a different path to be passed to the instance
      kernel. The new parameter is mandatory, and by default has the value of
      the old hardcoded value for both kvm and xen.
      
      Beta1 clusters will need to have this parameter added for their
      instances to be able to boot.
      
      Reviewed-by: iustinp
      074ca009
    • Guido Trotter's avatar
      KVM: implement GetShellCommandForConsole · 637ce7f9
      Guido Trotter authored
      This is a class method, because it calls _InstanceSerial, which is
      another class method. The patch changes it to classmethod for all the
      hypervisor classes.
      
      Reviewed-by: iustinp
      637ce7f9
    • Guido Trotter's avatar
      KVM: classify _Instance{Monitor,Serial,KVMRuntime} · 0df4d98a
      Guido Trotter authored
      Those methods need nothing from the instantiated class, and just
      manipulate strings, and fetch some class global variables, so they can
      be classmethods.
      
      Reviewed-by: iustinp
      0df4d98a
  4. Jan 26, 2009
  5. Jan 23, 2009
    • Guido Trotter's avatar
      Xen and KVM: correct a typo when checking args · 50cb2e2a
      Guido Trotter authored
      A missing 'be' was present in the error string for both xen and kvm,
      when the kernel or initrd path was not absolute.
      
      Reviewed-by: imsnah
      50cb2e2a
    • Iustin Pop's avatar
      Sort the instance names in batcher · 7312b33d
      Iustin Pop authored
      In case we submit multiple instances via batcher, it's nicer to have the
      sorted nicely.
      
      Reviewed-by: imsnah
      7312b33d
    • Iustin Pop's avatar
      Fix batcher for 2.0-style disks and nics · 9939547b
      Iustin Pop authored
      This patch fixes the gnt-instance batch-create command, and in doing so
      also slightly changes two other functions:
        - we change utils.ParseUnit so that it accepts integer values also
          (both ParseUnit(5) and ParseUnit("5") return the same value)
        - a bridge 'None' in LUCreateInstance will be converted to the default
          bridge; currently only missing bridges will be accepted to mean the
          default one
      
      The main changes to batcher were the change to variable number of disks
      and NICs.
      
      The patch also adds a batcher-instances.json example file copied from
      the 1.2 branch and properly modified.
      
      Reviewed-by: imsnah, killerfoxi
      9939547b
    • Iustin Pop's avatar
      Make iallocator work with offline nodes · 1325da74
      Iustin Pop authored
      This patch changes the iallocator framework to work with and properly
      export to plugins offline nodes. It does this by only exporting the
      static configuration data for those nodes, and not attempting to parse
      the runtime data.
      
      The patch also fixes bugs in iallocator related to the RpcResult
      conversion, changes the should_run to admin_up attribute name (as per
      the internals change), and adds “-I” as a short option for
      “--iallocator” in gnt-instance, gnt-backup and burnin.
      
      Reviewed-by: ultrotter
      1325da74
    • Iustin Pop's avatar
      Remove checking of DRBD metadata for validity · 3b559640
      Iustin Pop authored
      Currently the DRBD code checks that the metadata devices are valid
      before creation, initial disk attachment and add children.
      
      However, the process for checking validity requires a free DRBD minor,
      and this conflict with parallel checking.
      
      There are at least three possible solutions:
        - serialize all checks, which means we reduce parallelism and need
          extra locks
        - don't pass a valid minor number, but one like “/dev/drbd256” (which
          is invalid); this works for current version of DRBD, but since it's
          not guaranteed to remain so it doesn't look nice
        - don't do the checking at all, and rely on “drbdsetup ... disk ...”
          to fail by itself
      
      The reason for checking metadata was that in 1.2, this was much cheaper
      than trying to activate devices (and the subsequent iteration over the
      minors). However, in 2.0, they have the same cost, so we can choose
      option 3: just remove the explicit checking and rely on drbdsetup and
      the kernel to fail.
      
      Since DRBD8._InitMeta still requires a minor number, the two places
      where this is run are handled as follows:
        - Create: we just use our own (unused currently) minor number
        - AddChildren: we keep using FindUnusedMinor, with the caveat that
          this function (used by replace-disks -n ...) cannot be yet
          parallelized
      
      Reviewed-by: ultrotter
      3b559640
    • Iustin Pop's avatar
      Rework the execution model in burnin · c723c163
      Iustin Pop authored
      This patch changes (significantly) the execution model in burnin:
        - for all runs, (almost) all instance mods in a single Burn* procedure
          are done as part of a job; so for example add disk, stop, remove
          disk, start are no longer done as separate jobs but as a single job
          consisting of four opcodes
        - for parallel runs, all Burn* procedures except the rename (which
          uses a single target name) run in parallel; before, only the
          creation was done in parallel
        - due to the single-job execution and also parallel execution, the
          logging messages are no longer happening synchronously with the
          execution, so they are more informative than an actual execution log
      
      The end result is that burnin now tests properly multi-opcode jobs and
      also tests all opcodes (except rename) for parallel execution.
      
      Note: On a test cluster, parallelization reduces burnin time from 23m to
      15m.
      
      Reviewed-by: ultrotter
      c723c163
    • Iustin Pop's avatar
      Relax the restrictions on temporary DRBD minors · 79b26a7a
      Iustin Pop authored
      Currently the restrictions are too harsh: there is a time interval
      between an instance gets a new disk and before it is added to the
      configuration in which the restriction is not met. We solve this by
      allowing temporary DRBD minors to match existing minors (for the same
      instance), such that parallel creations/minor allocations are OK.
      
      The change is done by moving the add of temporary minors to the
      minor map after the instance minors are computed, and only considering
      them as duplicate if the instance name doesn't match.
      
      Reviewed-by: ultrotter
      79b26a7a
    • Iustin Pop's avatar
      Introduce more configuration consistency checks · 4a89c54a
      Iustin Pop authored
      This patch enhances the duplicate DRBD minors checks (currently just a
      few) and adds automatic checks of configuration consistency at
      configuration file writing time.
      
      In order to do so and show meaningful error messages, the
      _UnlockedComputeDRBDMap function is changed to not raise errors in case
      of duplicates, but instead return both the minors map and the duplicate
      list, and its callers now raise the error. This allows the VerifyConfig
      function to return a complete list of duplicates.
      
      The new checks required some small updates to the unittests for the
      config module.
      
      Reviewed-by: ultrotter
      4a89c54a
    • Iustin Pop's avatar
      Fill the 'call' attribute of offline rpc results · 84b45587
      Iustin Pop authored
      When creating ‘fake’ results for offline nodes, we currently don't pass
      the call attribute. This complicates debugging, so even though this
      should not matter in practice, it's better to fix it.
      
      Reviewed-by: imsnah
      84b45587
    • Iustin Pop's avatar
      A couple of small fixes to iallocator · 8901997e
      Iustin Pop authored
      This removes some constraints:
        - only two disks supported, this is no longer true as the underlying
          functions can now compute size for a variable number of disks
        - error when the hypervisor was not being passed
        - typo error
      
      Reviewed-by: imsnah
      8901997e
  6. Jan 22, 2009
    • Iustin Pop's avatar
      luxi: close and reopen the socket on errors · 8d5b316c
      Iustin Pop authored
      This is less of an actual issue for regular gnt-* clients, but it's
      easily reproducible with burnin and possible with RAPI (depending on how
      the program uses luxi.Client(s)).
      
      In case of burnin, if we interrupt the client (^C) while it polls the
      job, it will abort and raise an error. After that, burnin issues a
      remove instance job, and at this point, we send the submit job (remove)
      call but the first thing we read from the socket will be the response to
      the previous poll job request, since that was queued already from the
      master.
      
      To solve this, whenever we detect an error in Transport.Call(), we close
      that transport and re-create a new one, to start anew. The other
      alternative would be to introduce a sequence to the protocol, but this
      is something that would be design-level change and it's not recommended
      at this stage.
      
      Reviewed-by: imsnah
      8d5b316c
  7. Jan 21, 2009
    • Guido Trotter's avatar
      ShutdownInstance: log instance name, not object · ca77edbc
      Guido Trotter authored
      When an instance fails to shut down we currently log its whole object,
      rather than just the instance name.
      
      Reviewed-by: iustinp
      ca77edbc
    • Guido Trotter's avatar
      KVM live migration: handle failure · c087266c
      Guido Trotter authored
      If the KVM live migration ends up in a 'failed' state it has been
      aborted at the kvm level, and the machine is still running locally.
      We support also the 'cancelled' state even though there should be no way
      of reaching it, without manual intervention.
      
      Reviewed-by: iustinp
      c087266c
    • Guido Trotter's avatar
      KVM: change a few IOError with EnvironmentError · 90c024f6
      Guido Trotter authored
      Reviewed-by: iustinp
      90c024f6
    • Guido Trotter's avatar
      KVM: instance migration · 30e42c4e
      Guido Trotter authored
      The tcp port used for migrating KVM instances is selectable at
      ./configure time. We use a single port as nodes are locked anyway during
      a migration, so no two migrations can happen at the same time to the
      same node.
      
      Reviewed-by: iustinp
      30e42c4e
    • Guido Trotter's avatar
      KVM: add the _InstancePidAlive function · 1f8b3a27
      Guido Trotter authored
      Throughout the kvm code we very often look for the instance pidfile
      name, read it, and check if the process is alive. Abstract this into a
      private function and use that one instead.
      
      This patch also changes RebootInstance to check whether the instance is
      alive before trying to reboot it.
      
      Reviewed-by: iustinp
      1f8b3a27
    • Guido Trotter's avatar
      KVM: fix RebootInstance · f02881e0
      Guido Trotter authored
      RebootInstance was broken, because it just used to call StartInstance
      with wrong parameters. With this patch we still stop the instance, but
      use the saved kvm runtime to start it again.
      
      Reviewed-by: iustinp
      f02881e0
    • Guido Trotter's avatar
      KVM: retry the instance shutdown command · 6567aff3
      Guido Trotter authored
      When we ask the instance to shutdown sometimes the command won't work,
      especially if the instance isn't fully booted up. We'll wait for a bit,
      and give it a few chances before giving up.
      
      Reviewed-by: iustinp
      6567aff3
    • Guido Trotter's avatar
      Xen: implement auxiliary migration functions · 4390ccff
      Guido Trotter authored
      These are used, for the xen hypervisor, to copy the xen config file to
      the remote node. This breaks migration for instances which have been
      migrated, but not restarted, with the old code, for which the config
      file was just lost.
      
      Reviewed-by: iustinp
      4390ccff
    • Iustin Pop's avatar
      Automatically release DRBD minors on success · 61cf6b5e
      Iustin Pop authored
      This patch converts the DRBD minors reservation protocol from explicit
      release to automatic release on the success paths. On the errors paths,
      it's still needed to manual release.
      
      The patch doesn't bring much by itself, but is needed for a future patch
      which enhances the automatic verification of configuration consistency.
      
      Reviewed-by: ultrotter
      61cf6b5e
    • Iustin Pop's avatar
      Fix some more pylint errors · c979d253
      Iustin Pop authored
      Two are real errors (invalid names) and one is style error (overriding
      name from outer scope).
      
      Reviewed-by: ultrotter
      c979d253
    • Iustin Pop's avatar
      One more gitignore rule · dc458d00
      Iustin Pop authored
      This was forgotten in the recent “switch to explicit ignore rules”.
      
      Reviewed-by: imsnah
      dc458d00
    • Iustin Pop's avatar
      Log the rpc call name in the RPC errors message · 1b8acf70
      Iustin Pop authored
      Currently the rpc module logs the error description and target node in
      rpc calls logging, as such:
      
        2009-01-21 00:50:01,456:  pid=1051/Thread-21 ERROR RPC error from node
          node1.example.com: Connection failed (111: Connection
          refused)
      
      but this doesn't help to understand which call caused this (here it's an
      offline node which should not be contacted at all).
      
      This patch adds the logging of the call too, so cases like the above can
      be debugged easier.
      
      Reviewed-by: imsnah, ultrotter
      1b8acf70
Loading