1. 07 Oct, 2008 1 commit
    • Iustin Pop's avatar
      Implement job 'waiting' status · e92376d7
      Iustin Pop authored
      Background: when we have multiple jobs in the queue (more than just a
      few), many of the jobs (up to the number of threads) will be in state
      'running', although many of them could be actually blocked, waiting for
      some locks. This is not good, as one cannot easily see what is
      happening.
      
      The patch extends the opcode/job possible statuses with another one,
      waiting, which shows that the LU is in the acquire locks phase. The
      mechanism for doing so is simple, we initialize (in the job queue) the
      opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
      control to the LU's Exec, it will call a notifier back into the
      _JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
      the proper queue locking). Because this mechanism does not save the job,
      all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
      so we also change the load sequence to consider WAITLOCK as RUNNING.
      
      With the patch applied, creating in parallel (via burnin) five instances
      on a five node cluster shows that only two are executing, while three
      are waiting for locks.
      
      Reviewed-by: imsnah
      e92376d7
  2. 01 Oct, 2008 3 commits
  3. 11 Sep, 2008 2 commits
    • Guido Trotter's avatar
      Implement adding/removal of locks by declaration · ca2a79e1
      Guido Trotter authored
      With this patch LUs can declare locks to be added when they start and/or
      removed after they finish. For now locks can only be added in the
      acquired state, and removed if owned, and added locks default to be
      removed again, unless some action is taken.
      
      Reviewed-by: imsnah
      ca2a79e1
    • Guido Trotter's avatar
      Use is_owned to determine whether to unlock · 80ee04a4
      Guido Trotter authored
      Now that is_owned is public we don't need to play games at the end of an
      LU. If we're still owning anything we just release it.
      
      Reviewed-by: imsnah
      80ee04a4
  4. 09 Sep, 2008 1 commit
    • Guido Trotter's avatar
      Processor: remove ChainOpCode · b2751b57
      Guido Trotter authored
      This function was incompatible with the new locking system, and its
      usage has been removed from the code. For now LUs share code by calling
      common module-private functions in cmdlib.py, in the future they will
      use tasklets (when those will be implemented).
      
      Reviewed-by: iustinp
      b2751b57
  5. 28 Aug, 2008 1 commit
    • Guido Trotter's avatar
      Fix issue when acquiring empty lock sets · 6683bba2
      Guido Trotter authored
      By design if an empty list of locks is acquired from a set, no locks are
      acquired, and thus release() cannot be called on the set. On the other
      hand if None is passed instead of the list, the whole set is acquired,
      and must later be released. When acquiring whole empty sets, a release
      must happen too, because the set-lock is acquired.
      
      Since we used to overwrite the required locks (needed_locks) with the
      acquired ones, we weren't able to distinguish the two cases (empty list
      of locks required, and all locks required, but an empty list returned
      because the set is empty). Valid solutions include:
        (1) forbidding the acquire of empty lists of locks
        (2) skipping the acquire/release on empty lists of locks
        (3) separating the to-acquire and the acquired list
      
      This patch implements the third approach, and thus LUs will find
      acquired locks in the acquired_locks dict, rather than in needed_locks.
      The LUs which used this feature before have been updated. This makes it
      easier because it doesn't force LUs to do more checks on corner cases,
      which are easily forgettable (1) and allows more flexibility if we want
      LUs to release (part-of) the locks (which is still a possibly scary
      operation, but anyway). This easily combines with (2) should we choose
      to implement it.
      
      Reviewed-by: imsnah
      6683bba2
  6. 18 Aug, 2008 1 commit
    • Guido Trotter's avatar
      Processor: lock all levels even if one is missing · 8a2941c4
      Guido Trotter authored
      If a locking level wasn't specified locking used to stop. This means
      that if one, for example, didn't specify anything at the LEVEL_INSTANCE
      level, no locks at the LEVEL_NODE level were acquired either. With this
      patch we force _LockAndExecLU to be called for all existing levels, and
      break the recursion if the level doesn't exist in locking.LEVELS.
      
      Reviewed-by: imsnah
      8a2941c4
  7. 30 Jul, 2008 5 commits
    • Guido Trotter's avatar
      ChainOpCode is still BGL-only · 64381ad7
      Guido Trotter authored
      Prevent mistakes with an assert.
      
      Reviewed-by: iustinp
      64381ad7
    • Iustin Pop's avatar
      Fix pylint-detected issues · 38206f3c
      Iustin Pop authored
      This is mostly:
        - whitespace fix (space at EOL in some files, not all, broken
          indentation, etc)
        - variable names overriding others (one is a real bug in there)
        - too-long-lines
        - cleanup of most unused imports (not all)
      
      Reviewed-by: ultrotter
      38206f3c
    • Guido Trotter's avatar
      Make sharing locks possible · 3977a4c1
      Guido Trotter authored
      LUs can declare which locks they need by populating the
      self.needed_locks dictionary, but those locks are always acquired as
      exclusive. Make it possible to acquire shared locks as well, by
      declaring a particular level as shared in the self.share_locks
      dictionary. By default this dictionary is populated so that all locks
      are acquired exclusively.
      
      Reviewed-by: iustinp
      3977a4c1
    • Guido Trotter's avatar
      Add LogicalUnit.DeclareLocks · fb8dcb62
      Guido Trotter authored
      This additional LogicalUnit function is optional to implement, but lets
      you change your locking needs for one level just before locking it, but
      after the previous levels have been already locked. It is useful for
      example to calculate what nodes to lock after locking an instance.
      
      Reviewed-by: iustinp
      fb8dcb62
    • Iustin Pop's avatar
      Rework master startup/shutdown/failover · b1b6ea87
      Iustin Pop authored
      This (big) patch reworks the master startup/shutdown and the fixes the
      master failover.
      
      What does the patch do?
      
      For master start/stop:
        - remove the old ganeti-master script and its associated man page
        - moves the ip start/stop directly into the backend.(Start|Stop)Master
        - adds start/stop of the master/rapi daemon into these functions,
          selectively based on the start/stop arguments
        - makes the master call via rpc StartMaster(start_daemons=False) to
          the local node so that the master IP is started
        - and finally changes the example init.d script to directly start and
          stop all three daemons, since they do the right thing (depending on
          master/not master role)
      
      For master failover:
        - moves the code from LUMasterFailover into bootstrap.MasterFailover,
          since we need to start/stop the master during this operation and
          thus it can't be executed from the master
        - removes the LUMasterFailover and its associated opcode
      
      Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
      master' are not seen during startup on non-master nodes.
      
      Reviewed-by: ultrotter
      b1b6ea87
  8. 23 Jul, 2008 1 commit
    • Guido Trotter's avatar
      Invert nodes/instances locking order · 04e1bfaf
      Guido Trotter authored
      An implementation mistake from the original design caused nodes to be
      locked before instances, rather than after. This patch inverts the level
      numbering, changing also the relevant unittests and the recursive
      locking function starting point.
      
      Reviewed-by: iustinp
      04e1bfaf
  9. 14 Jul, 2008 1 commit
    • Iustin Pop's avatar
      First version of user feedback fixes · f1048938
      Iustin Pop authored
      This patch contains a raw version for fixing feedback_fn.
      
      The new mechanism works as follows:
        - instead of a per-Processor feedback_fn, there's one for each
          ExecOpCode, so that feedback for different opcodes go via possibly
          different functions
        - each _QueuedOpCode gets a message buffer, a method for adding
          feedback and a method for retrieving (parts of) the feedback
        - the _QueuedJob object gets a new attribute that is equal to the
          index of the currently executing opcode
        - job queries get an extra parameter called 'ticker' that will return
          the latest message on the current executing opcode
        - the cli.py job completion poll will show the new status if different
          from the old one
      
      Of course, quick messages will be lost, as currently only the latest one
      is available. Also changes between opcodes are not represented at all.
      
      Reviewed-by: imsnah
      f1048938
  10. 08 Jul, 2008 4 commits
    • Guido Trotter's avatar
      Processor: Acquire locks before executing an LU · 68adfdb2
      Guido Trotter authored
      If we're running in a "new style" LU we may need some locks, as required
      by the ExpandNames function, to be able to run. We'll walk up the lock
      levels present in the needed_locks dictionary and acquire them, then run
      the actual LU. LUs can release some or all the acquired locks, if they
      want, before terminating, provided they update their needed_locks
      dictionary appropriately, so that we know not to release a level if they
      have already done so.
      
      Reviewed-by: iustinp
      68adfdb2
    • Guido Trotter's avatar
      LogicalUnit: add ExpandNames function · d465bdc8
      Guido Trotter authored
      New concurrent LUs will need to call ExpandNames so that any names
      passed in by the user are canonicalized, and can be used by hooks,
      locking and other parts of the code. This was done in CheckPrereq
      before, but it's now splitted out, as it's needed for locking, which in
      turn CheckPrereq needs. Old LUs can be converted gradually.
      
      Reviewed-by: iustinp
      d465bdc8
    • Guido Trotter's avatar
      Processor: Move LU execution to its own method · 36c381d7
      Guido Trotter authored
      This makes the try...finally code simplier, and helps adding a more
      complex locking structure before the actual execution. It also fixes a
      concurrency bug caused by the fact that write_count was read before
      acquiring the BGL, and thus spurious config update hooks run could have
      been triggered. This doesn't solve the issue of running config update
      hooks for concurrent LUs.
      
      Reviewed-by: iustinp
      36c381d7
    • Guido Trotter's avatar
      Pass context to LUs · 77b657a3
      Guido Trotter authored
      Rather than passing a ConfigWriter to the LUs we'll pass the whole
      context, from which a ConfigWriter can be extracted, but we can also
      access the GanetiLockManager. This also fixes the places where a FakeLU
      is created.
      
      Reviewed-by: iustinp
      77b657a3
  11. 01 Jul, 2008 3 commits
    • Guido Trotter's avatar
      Context: s/GLM/glm/ · 984f7c32
      Guido Trotter authored
      Make the GanetiLockManager instance of GanetiContext lowercase
      
      Reviewed-by: imsnah
      984f7c32
    • Guido Trotter's avatar
      Processor: acquire the BGL for LUs requiring it · 04864530
      Guido Trotter authored
      If a LU required the BGL (all LUs do, right now, by default) we'll
      acquire it in the Processor before starting them. For LUs that don't
      we'll still acquire it, but in a shared fashion, so that they cannot run
      together with LUs that do.
      
      We'll also note down whether we own the BGL exclusively, and if we don't
      and we try to chain a LU that does, we'll fail.
      
      More work will need to be done, of course, to convert LUs not to require
      the BGL, but this basic infrastructure should guarantee the coexistance
      of the old and new world for the time being.
      
      Reviewed-by: iustinp
      04864530
    • Guido Trotter's avatar
      Processor: pass context in and use it. · 1c901d13
      Guido Trotter authored
      The processor used to create a new ConfigWriter when it was initialized.
      We now have one in the context, so we'll just recycle it. First of all
      we'll pass the context in when creating a new Processor object, then
      we'll just use context.cfg, which is granted to be initialized, wherever
      we used self.cfg, and stop checking whether the config is already
      initialized or not.
      
      In the future the Processor will be able to use the context also to
      acquire the BGL for LUs that require it, and to push the context down to
      LUs that don't in order for them to manage their own locking.
      
      Reviewed-by: iustinp
      1c901d13
  12. 30 Jun, 2008 1 commit
    • Guido Trotter's avatar
      Fix sstore handling in Processor · c6868e1d
      Guido Trotter authored
      - no need to keep the sstore as an object member, remove it
      - don't reinitialize sstore only if self.cfg is None
          This is not an issue, as the Processor is recycled for every opcode,
          but in general we know that (a) we might need a different type of
          sstore for different opcodes and (b) initializating them is cheap
      - recreate sstore when chaining opcodes
          Without this fix chaining an opcode which requires a writable sstore
          to one which doesn't would fail. This doesn't happen today, but it's
          better to fix it anyway
      
      These changes are possible because nowadays all opcodes already require
      a working cluster/configuration.
      
      Reviewed-by: iustinp
      c6868e1d
  13. 23 Jun, 2008 1 commit
    • Iustin Pop's avatar
      Fix gnt-cluster “command” and “copyfile” · b3989551
      Iustin Pop authored
      Since the disabling of forking in the master daemon, the two ssh-based
      subcommands were not working anymore. However, there is no need at all
      for the commands to be run from the master daemon (permissions to read
      the cluster private ssh key notwithstanding), they can be run directly
      from the command line utilities.
      
      The patch removes the two opcodes OpRunClusterCommand and
      OpClusterCopyFile (and their associated LUs) and changes the code in
      ‘gnt-cluster’ to query the list of nodes and run directly the SshRunner
      over the list. As such, all forking is done from the gnt-cluster script,
      and the commands are working again.
      
      Reviewed-by: imsnah
      b3989551
  14. 17 Jun, 2008 1 commit
    • Iustin Pop's avatar
      Implement disk grow at LU level · 8729e0d7
      Iustin Pop authored
      This patch adds a new opcode and LU for growing an instance's disk.
      
      The opcode allows growing only one disk at time, and will throw an error
      if the operation fails midway (e.g. on the primary node after it has
      been increased on the secondary node). As such, it might actually leave
      different sized LVs on different nodes, but this will not create
      problems.
      
      Reviewed-by: imsnah
      8729e0d7
  15. 16 Jun, 2008 1 commit
    • Guido Trotter's avatar
      Move SetKey to WritableSimpleStore and use it · 05f86716
      Guido Trotter authored
      Before we used to be able to update SimpleStore by just calling SetKey, this
      feature is now moved to an external class, which inherits from it. In this
      patch the new WritableSimpleStore class is also put to use, in the LUs that
      need it. Rather than making each LU instantiate it, we have a new LogicalUnit
      flag REQ_WSSTORE which defaults to False, but when declared to be True asks the
      LogicalUnit to be initialized with a writeable version of the SimpleStore.
      LUMasterFailover and LURenameCluster are then changed to use it.
      
      InitCluster is also changed to instantiate a WritableSimpleStore, rather
      than a normal one.
      
      Reviewed-by: imsnah
      05f86716
  16. 12 Jun, 2008 2 commits
  17. 30 Apr, 2008 2 commits
    • Guido Trotter's avatar
      Add a LU Hooks notification function · 1fce5219
      Guido Trotter authored
      Previously LUs could be failed by pre-hooks, and post-hooks just had effects by
      themselves. This patch allows a LU to define the HooksCallBack function if it
      wants to know about its hooks' results and alter its results in response.
      
      The ChainOpCode execution path contains some commented out hooks code, which
      this patch modifies to run the HooksCallBack function, so this is not forgot if
      it ever gets uncommented out.
      
      Reviewed-by: iustinp
      
      1fce5219
    • Guido Trotter's avatar
      HooksMaster: Make RunPhase return the rpc output · b07a6922
      Guido Trotter authored
      Right now the hooks output is propagated from the nodes all the way up to
      HooksMaster.RunPhase, which uses it for debugging PRE hooks, but then silently
      discards them. We'll now propagate it up to the Processor.ExecOpCode function,
      where they can be handled for other purposes (or discarded again, of course).
      This patch also improves a bit the HooksMaster.RunPhase docstring.
      
      Reviewed-by: iustinp
      
      b07a6922
  18. 23 Apr, 2008 1 commit
  19. 16 Apr, 2008 1 commit
    • Iustin Pop's avatar
      Allocator framework, 1st part: allocator input generation · d61df03e
      Iustin Pop authored
      In preparation for the introduction of automatic instance allocator,
      this patch adds an allocator simulation opcode, that based on the input
      parameters, will return either the input message to the allocator
      (implemented) or the result of the allocator run (not yet implemented).
      
      This allows algorithm tests against simulated allocations and the
      current cluster state.
      
      The patch adds the following:
        - a function that generates the generic cluster information for the
          allocator
        - a function that generates the 'new instance' information
        - a function that generates the 'replace_secondary' information
      
      These three functions will be used by the allocator framework later to
      generate the actual information for the external algorithms. Currently
      we just return the json-serialized text.
      
      Reviewed-by: imsnah
      d61df03e
  20. 31 Mar, 2008 2 commits
  21. 30 Mar, 2008 1 commit
    • Iustin Pop's avatar
      Change the order of config updates in some LUs · fe482621
      Iustin Pop authored
      In the start and stop instance LUs, the configuration update is done
      right at the end. This means that if, for example, the instance shutdown
      succeeds, but the drive deactivation fails, the next run of the watcher
      will start the instance again, as it's still marked in running mode.
      
      This patch changes these two LUs so that first the update the
      configuration to the desired state, and only then we proceed to update
      the config. This ensures that the state saved is the desired state.
      
      Because the config might be updated even though the LU failed, this
      patch also modifies the mcpu.Processor.ExecOpCode method to run the
      RunConfigUpdate hook in a finally: phase while the lu.Exec is done in
      its try phase. This ensures that config update hooks (tries to) run at
      all times when the config is updated.
      
      Reviewed-by: schreiberal
      fe482621
  22. 25 Mar, 2008 1 commit
  23. 05 Mar, 2008 1 commit
  24. 22 Feb, 2008 1 commit
  25. 05 Feb, 2008 1 commit