1. 30 Jul, 2008 8 commits
    • Guido Trotter's avatar
      Add LogicalUnit.DeclareLocks · fb8dcb62
      Guido Trotter authored
      This additional LogicalUnit function is optional to implement, but lets
      you change your locking needs for one level just before locking it, but
      after the previous levels have been already locked. It is useful for
      example to calculate what nodes to lock after locking an instance.
      
      Reviewed-by: iustinp
      fb8dcb62
    • Guido Trotter's avatar
      LURenameInstance, add/remove relevant locks · 74b5913f
      Guido Trotter authored
      LURenameInstance forgot to remove the old lock name and add the new one,
      making it impossible for parallel LUs to act on the instance (without a
      master daemon restart). This also fixes burning+rename with the
      parallelization of {Start,Stop}Instance.
      
      Reviewed-by: iustinp
      74b5913f
    • Michael Hanselmann's avatar
      Rewrite job queue · 85f03e0d
      Michael Hanselmann authored
      We found several issues in the old job queue implementation. It had race
      conditions, deadlocks and other deficiencies.
      
      Short summary:
      - _QueuedOpCode and _QueuedJob are now more or less data structures with a few
        utility functions. __Setup is gone.
      - DiskJobStorage and JobQueue classes merged into one to reduce code complexity.
      - One lock in JobQueue for almost everything. There's also a lock per opcode
        for log messages.
      
      Reviewed-by: iustinp
      85f03e0d
    • Michael Hanselmann's avatar
      workerpool: Log when waiting for a thread · c0a8eb9e
      Michael Hanselmann authored
      Reviewed-by: iustinp
      c0a8eb9e
    • Iustin Pop's avatar
      Rework master startup/shutdown/failover · b1b6ea87
      Iustin Pop authored
      This (big) patch reworks the master startup/shutdown and the fixes the
      master failover.
      
      What does the patch do?
      
      For master start/stop:
        - remove the old ganeti-master script and its associated man page
        - moves the ip start/stop directly into the backend.(Start|Stop)Master
        - adds start/stop of the master/rapi daemon into these functions,
          selectively based on the start/stop arguments
        - makes the master call via rpc StartMaster(start_daemons=False) to
          the local node so that the master IP is started
        - and finally changes the example init.d script to directly start and
          stop all three daemons, since they do the right thing (depending on
          master/not master role)
      
      For master failover:
        - moves the code from LUMasterFailover into bootstrap.MasterFailover,
          since we need to start/stop the master during this operation and
          thus it can't be executed from the master
        - removes the LUMasterFailover and its associated opcode
      
      Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
      master' are not seen during startup on non-master nodes.
      
      Reviewed-by: ultrotter
      b1b6ea87
    • Iustin Pop's avatar
      Expose utils.DaemonPidFileName · 53beffbb
      Iustin Pop authored
      Since we need to compute this from outside utils.py, we change this to a
      public function.
      
      Reviewed-by: ultrotter
      53beffbb
    • Iustin Pop's avatar
      Implement checking for the master role in rapi · 5675cd1f
      Iustin Pop authored
      This patch moves the CheckMaster function from ganeti-masterd to ssconf
      (most logical place, it cannot go in utils since we would have recursive
      imports between ssconf and utils) and changes ganeti-rapi to also call
      this function.
      
      This is needed so that starting ganeti-rapi on a non-master node does
      the right thing.
      
      Reviewed-by: ultrotter
      5675cd1f
    • Iustin Pop's avatar
      Add a new parameter to backend.(Start|Stop)Master · 1c65840b
      Iustin Pop authored
      This patch adds a new, unused for now, parameter to the start and stop
      master operations in backend. The idea behind it is that we need to be
      able to control whether the IP (de)activation is coupled with daemon
      startup/shutdown.
      
      The callers are also modified to pass this parameter (even if unused for
      now).
      
      Reviewed-by: ultrotter
      1c65840b
  2. 29 Jul, 2008 6 commits
    • Michael Hanselmann's avatar
      Log thread name when debug output is enabled · 6aff91f6
      Michael Hanselmann authored
      Reviewed-by: iustinp
      6aff91f6
    • Michael Hanselmann's avatar
      jqueue: Fix error logging · 8090e19f
      Michael Hanselmann authored
      The passed parameters were not correct.
      
      Reviewed-by: iustinp, ultrotter
      8090e19f
    • Iustin Pop's avatar
      Fix constants typo · bff2ddc5
      Iustin Pop authored
      Reviewed-by: imsnah
      bff2ddc5
    • Iustin Pop's avatar
      Use constants for the pid file stems · 99e88451
      Iustin Pop authored
      Reviewed-by: imsnah
      99e88451
    • Iustin Pop's avatar
      Add a KillProcess function · b2a1f511
      Iustin Pop authored
      We cannot depend on all environments to have a start-stop-daemon or
      similar tool. We instead implement a KillProcess function that behaves
      similar to “start-stop-daemon --retry”.
      
      Note that the attached unittest can hang in foreground if the child
      misbehaves (doesn't write to the internal pipe). Since unittest are
      either run in the foreground or are run with a timeout from an automated
      framework, I think this is an acceptable trade-off (against of using
      hardcoded timeouts in the test).
      
      Reviewed-by: imsnah
      b2a1f511
    • Iustin Pop's avatar
      Change IsPidFileAlive into ReadPidFile · d9f311d7
      Iustin Pop authored
      We already have a function to test if a PID is alive, so it makes more
      sense to use function composition that force calling (since we need to
      read PIDs from files in other places too). Now IsProcessAlive returns
      False for PIDs <= 0, since this is the error return from ReadPidFile.
      
      The patch also adds a unittest for checking that WriteFile raises the
      correct exception, and checks that an invalid or missing file causes
      ReadPidFile to return zero. The unittest tearDown method will try to
      cleanup the temp directory too (otherwise it leaves stuff after it).
      
      Reviewed-by: ultrotter
      d9f311d7
  3. 28 Jul, 2008 5 commits
  4. 25 Jul, 2008 2 commits
  5. 24 Jul, 2008 3 commits
  6. 23 Jul, 2008 12 commits
    • Michael Hanselmann's avatar
      Move code formatting job ID into a base class · ce594241
      Michael Hanselmann authored
      A later patch will add a memory based job storage class, hence this
      code is going into a separate class. It also changes the number format
      to always use at least 10 digits, allowing up to 9'999'999'999 jobs to
      be sorted without using a custom function.
      
      Reviewed-by: iustinp
      ce594241
    • Guido Trotter's avatar
      Add utils.{Write,Remove}PidFile · b330ac0b
      Guido Trotter authored
      WritePidFile is a helper function that writes the current pid in a
      pidfile within the ganeti run directory. RemovePidFile tries to delete
      it.
      
      Reviewed-by: iustinp
      
      b330ac0b
    • Guido Trotter's avatar
      Add utils.IsPidFileAlive function · fee80e90
      Guido Trotter authored
      This helper function reads a pid from a file containing it and checks
      whether it refers to a live process.
      
      Reviewed-by: iustinp
      
      fee80e90
    • Guido Trotter's avatar
      Invert nodes/instances locking order · 04e1bfaf
      Guido Trotter authored
      An implementation mistake from the original design caused nodes to be
      locked before instances, rather than after. This patch inverts the level
      numbering, changing also the relevant unittests and the recursive
      locking function starting point.
      
      Reviewed-by: iustinp
      04e1bfaf
    • Oleksiy Mishchenko's avatar
      Generalization of bulk output mapping · 51ee2f49
      Oleksiy Mishchenko authored
      Reviewed-by: iustinp
      51ee2f49
    • Michael Hanselmann's avatar
      Rename JobStorage to DiskJobStorage · 21cc1fbd
      Michael Hanselmann authored
      Reviewed-by: iustinp
      21cc1fbd
    • Michael Hanselmann's avatar
      Fix logging with string job IDs · 205d71fd
      Michael Hanselmann authored
      The job ID is now a string, hence logging must use %s instead of %d.
      
      Reviewed-by: iustinp
      205d71fd
    • Iustin Pop's avatar
      Simplify rapi.baserlib.MapFields() · dca1764e
      Iustin Pop authored
      We can use zip for simplifying this function. Actually, at this point
      I'm not sure if it needs to be a separate function at all.
      
      Reviewed-by: imsnah
      dca1764e
    • Michael Hanselmann's avatar
      Make job ID a string · 3be9a705
      Michael Hanselmann authored
      The docstring says that _NewSerialUnlocked returns “a string
      representing the job identifier”. Until now it returned an
      integer and this patch changes it.
      
      Reviewed-by: iustinp
      3be9a705
    • Iustin Pop's avatar
      Distribute the queue serial file after each update · c3f0a12f
      Iustin Pop authored
      This patch adds distribution of the queue serial file after each write
      to it (but before a new job is created and written with that ID, and
      before a response is returned, so we should be safe from crashes in
      between).
      
      Currently it only logs if a node cannot be contacted, it should abort if
      > 50% errors are seen.
      
      Reviewed-by: imsnah
      c3f0a12f
    • Iustin Pop's avatar
      Make the job storage init reuse a serial file · c4beba1c
      Iustin Pop authored
      This will be needed for master failover. If we don't have a valid queue
      directory, we need to reinitialize it, but we should keep the existing
      serial number.
      
      As such, we abstract the reading of the serial and if we find a valid
      serial, we do not reset it.
      
      Reviewed-by: imsnah
      c4beba1c
    • Guido Trotter's avatar
      Move BDEV_CACHE_DIR to RUN_GANETI_DIR/bdev-cache · 42ff3343
      Guido Trotter authored
      This was a TODO for 2.0
      
      Reviewed-by: iustinp
      42ff3343
  7. 22 Jul, 2008 4 commits
    • Guido Trotter's avatar
      Convert SetInstanceParams to concurrency · 1a5c7281
      Guido Trotter authored
      Grab a lock for the instance we're working on, and update its params.
      
      Reviewed-by: iustinp
      1a5c7281
    • Guido Trotter's avatar
      Use Update in SetInstanceParams · ea94e1cd
      Guido Trotter authored
      When we set the instance params we're not adding a new instance, but
      just updating an existing one, so why using AddInstance?
      
      Reviewed-by: iustinp
      ea94e1cd
    • Guido Trotter's avatar
      Convert LUConnectConsole to concurrency · 8659b73e
      Guido Trotter authored
      For ConnectConsole we just need to lock the instance we're connecting
      to. We make a few rpcs to its primary node, but node daemons can now
      handle multiple queries and nodes cannot be removed till they have
      instances on them anyway. Note that since we return the ssh command, and
      that's executed outside of the ganeti daemon, without any locks held,
      the instance can then be subject to operations while we're connected to
      it, but that was the previous behavior as well.
      
      Reviewed-by: iustinp
      8659b73e
    • Guido Trotter's avatar
      Add _ExpandAndLockInstance auxiliary function. · 43905206
      Guido Trotter authored
      LUs that take an instance name as input and need to expand its name and
      lock it can use it to simplify their ExpandNames call. Possibly, and
      _ExpandAndLockNode will come as well.
      
      Reviewed-by: iustinp
      43905206