1. 01 Oct, 2008 3 commits
  2. 09 Sep, 2008 2 commits
    • Michael Hanselmann's avatar
      Never remove job queue lock in node daemon · 1bc59f76
      Michael Hanselmann authored
      Otherwise, corruption could occur in some corner cases. E.g. when
      LeaveNode is running in a child and is in the process of removing
      queue files, the main process gets killed, started again and gets
      a request to update the queue. This is rather extreme corner case,
      but we should opt for safety.
      
      Reviewed-by: iustinp
      1bc59f76
    • Iustin Pop's avatar
      Change backend._GetMasterInfo to return more data · bd1e4562
      Iustin Pop authored
      The _GetMasterInfo() function needs to export the master name too to be
      useful in master safety checks. This patch makes it a public (no _)
      function and adds a third element in the return tuple. Its callers are
      modified too.
      
      Reviewed-by: imsnah
      bd1e4562
  3. 14 Aug, 2008 1 commit
    • Guido Trotter's avatar
      Pass hypervisor type to the OS scripts · 4f0afaf5
      Guido Trotter authored
      It's handy to make the os scripts know which hypervisor the instance is
      going to run under. In order not to change the os API we pass this
      information in the environment, where the os scripts can access it if
      they're hypervisor-aware.
      
      Reviewed-by: imsnah
      4f0afaf5
  4. 08 Aug, 2008 7 commits
  5. 06 Aug, 2008 1 commit
  6. 31 Jul, 2008 1 commit
  7. 30 Jul, 2008 4 commits
    • Iustin Pop's avatar
      Fix pylint-detected issues · 38206f3c
      Iustin Pop authored
      This is mostly:
        - whitespace fix (space at EOL in some files, not all, broken
          indentation, etc)
        - variable names overriding others (one is a real bug in there)
        - too-long-lines
        - cleanup of most unused imports (not all)
      
      Reviewed-by: ultrotter
      38206f3c
    • Iustin Pop's avatar
      Fix some errors detected by pylint · 3b9e6a30
      Iustin Pop authored
      Reviewed-by: imsnah
      3b9e6a30
    • Iustin Pop's avatar
      Rework master startup/shutdown/failover · b1b6ea87
      Iustin Pop authored
      This (big) patch reworks the master startup/shutdown and the fixes the
      master failover.
      
      What does the patch do?
      
      For master start/stop:
        - remove the old ganeti-master script and its associated man page
        - moves the ip start/stop directly into the backend.(Start|Stop)Master
        - adds start/stop of the master/rapi daemon into these functions,
          selectively based on the start/stop arguments
        - makes the master call via rpc StartMaster(start_daemons=False) to
          the local node so that the master IP is started
        - and finally changes the example init.d script to directly start and
          stop all three daemons, since they do the right thing (depending on
          master/not master role)
      
      For master failover:
        - moves the code from LUMasterFailover into bootstrap.MasterFailover,
          since we need to start/stop the master during this operation and
          thus it can't be executed from the master
        - removes the LUMasterFailover and its associated opcode
      
      Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
      master' are not seen during startup on non-master nodes.
      
      Reviewed-by: ultrotter
      b1b6ea87
    • Iustin Pop's avatar
      Add a new parameter to backend.(Start|Stop)Master · 1c65840b
      Iustin Pop authored
      This patch adds a new, unused for now, parameter to the start and stop
      master operations in backend. The idea behind it is that we need to be
      able to control whether the IP (de)activation is coupled with daemon
      startup/shutdown.
      
      The callers are also modified to pass this parameter (even if unused for
      now).
      
      Reviewed-by: ultrotter
      1c65840b
  8. 23 Jul, 2008 1 commit
    • Iustin Pop's avatar
      Distribute the queue serial file after each update · c3f0a12f
      Iustin Pop authored
      This patch adds distribution of the queue serial file after each write
      to it (but before a new job is created and written with that ID, and
      before a response is returned, so we should be safe from crashes in
      between).
      
      Currently it only logs if a node cannot be contacted, it should abort if
      > 50% errors are seen.
      
      Reviewed-by: imsnah
      c3f0a12f
  9. 11 Jul, 2008 3 commits
    • Iustin Pop's avatar
      Convert backend.py to the logging module · 18682bca
      Iustin Pop authored
      The patch also switches some of the exception logs to use
      logging.exception (and therefore the log message will have a diferent
      format).
      
      (Note that this might not be a good choice in all cases, though)
      
      Reviewed-by: imsnah
      18682bca
    • Iustin Pop's avatar
      Fix backend.NodeVolumes handling of LVM output · a17a7623
      Iustin Pop authored
      This is the same fix as for GetVolumeList.
      
      I've checked manually and all other places that call lvm commands are
      already checking the output validity in terms of correct number of
      fields.
      
      Reviewed-by: ultrotter
      a17a7623
    • Iustin Pop's avatar
      Fix backend.GetVolumeList handling of LVM output · df4c2628
      Iustin Pop authored
      Sometimes ‘lvs’ can spit error messages on stdout, even when one wants
      to parse the output:
      ...
      Inconsistent metadata copies found - updating to use version 2776
      ...
      
      So we need to validate the output to guard against such cases.
      
      The patch converts the split on the separater to match against a regex
      and extract the fields via groups. The original separator choice is a
      bad one now :(
      
      Reviewed-by: imsnah
      df4c2628
  10. 27 Jun, 2008 2 commits
  11. 20 Jun, 2008 1 commit
    • Iustin Pop's avatar
      Add a rpc call for BlockDev.Close() · d61cbe76
      Iustin Pop authored
      This patch adds rpc layer calls (in rpc.py and the equivalent in
      ganeti-noded) to close a list of block devices, and the wrapper in
      backend.py that takes a list of Disk objects, identifies them and
      returns correctly formatted results.
      
      The reason why this very basic call was missing until now from the rpc
      layer is that we usually don't care about device closes (though we
      should, and will do so in the future) as only drbd has a meaningful
      Close() operation; right now we directly do Shutdown().
      
      The patch is clean enough that it's actually independent of the live
      migration implementation.
      
      Reviewed-by: imsnah
      d61cbe76
  12. 16 Jun, 2008 2 commits
    • Iustin Pop's avatar
      Expose block device grow in backend.py · 594609c0
      Iustin Pop authored
      This patch adds a wrapper over the block device grow operation that
      converts the input and output parameters as needed for the rpc layer.
      
      Reviewed-by: imsnah
      594609c0
    • Iustin Pop's avatar
      Add migration support at the rpc layer · 2a10865c
      Iustin Pop authored
      This patch adds the migration rpc call and its implementation in the
      backend. The patch does not deal with the correct activation of disks.
      
      Because of the new RPC, the protocol version is increased.
      
      Reviewed-by: imsnah
      2a10865c
  13. 13 May, 2008 2 commits
    • Iustin Pop's avatar
      Implement node daemon conectivity tests · 9d4bfc96
      Iustin Pop authored
      This patch adds in gnt-cluster verify checks for inter-node tcp
      communication checks on the node daemon port for both the primary and
      (if defined) secondary networks.
      
      The output looks like (4-node cluster, one with the secondary interface
      down):
      * Verifying node node1.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      * Verifying node node2.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      * Verifying node node3.example.com
        - ERROR: tcp communication with node 'node1.example.com': failure using the secondary interface(s)
        - ERROR: tcp communication with node 'node2.example.com': failure using the secondary interface(s)
        - ERROR: tcp communication with node 'node4.example.com': failure using the secondary interface(s)
      * Verifying node node4.example.com
        - ERROR: tcp communication with node 'node3.example.com': failure using the secondary interface(s)
      
      Reviewed-by: imsnah
      9d4bfc96
    • Iustin Pop's avatar
      Reduce chance of ssh failures in verify cluster · b544cfe0
      Iustin Pop authored
      The cluster verify builds a sorted list of nodes and passes that to all
      the nodes (in parallel) for ssh checks. This means that for a cluster
      with N nodes, there will be approximately N simultaneous connections to
      the first node, then to the second node, etc. This, coupled with the
      ssh daemon's “MaxStartups” parameter, can create false alarms about ssh
      connectivity.
      
      This patch randomizes the node list in the backend (therefore, each node
      should have it's own order of ssh-ing to the other nodes) and the chance
      of these alarms should be reduced.
      
      Reviewed-by: ultrotter
      b544cfe0
  14. 30 Apr, 2008 1 commit
  15. 28 Apr, 2008 1 commit
    • Iustin Pop's avatar
      Move iallocator script execution to ganeti-noded · 8d528b7c
      Iustin Pop authored
      Currently the iallocator execution takes place in the master, which is a
      violation of the current architecture, and will create problems with a
      threaded master daemon.
      
      This patch moves the execution to the backend, similar to the hooks
      runner, by:
        - introducing a new class that handles the execution in the backend
          (and could be used also for listing the allocators, etc.)
        - introducing a new rpc call
        - replacing the actual execution in IAllocator.Run() with a rpc call
      
      This passes burnin with the dumb allocator
      
      Reviewed-by: imsnah
      8d528b7c
  16. 24 Apr, 2008 1 commit
  17. 10 Apr, 2008 2 commits
    • Iustin Pop's avatar
      Move the OS search code into an abstract function · 57c177af
      Iustin Pop authored
      Based on the previous OS search code changes, we can now move the OS
      search code into a generic look-for-file function in utils.py. This
      means that the allocator code can use the same function.
      
      Reviewed-by: ultrotter
      57c177af
    • Iustin Pop's avatar
      Change backend._OSSearch return values · c34c0cfd
      Iustin Pop authored
      Currently, the function backend._OSSearch() returns the (first) base dir
      in which this OS can be found. Thereafter the full actual path to the OS
      dir is built in the backend.OSFromDisk() function.
      
      This patch changes this so that _OSSearch() always returns the full path
      to the OS directory, and OSFromDisk uses that as returned (it will only
      build it if it gets a base dir in the first place).
      
      This patch is needed before we can abstract the _OSSearch into a generic
      'look for file object' functionality that can be used for allocator
      plugins search too.
      
      Reviewed-by: ultrotter
      c34c0cfd
  18. 05 Apr, 2008 1 commit
    • Manuel Franceschini's avatar
      Backend directory functions for file backend · 778b75bb
      Manuel Franceschini authored
      Add _[Create,Remove,Rename]FileStorageDir function which are needed for
      file-based instance management. These function check whether the given
      directory to operate on is under the cluster-wide defined default file
      storage dir. If this is not the case the won't do anything and return
      False. This is to prevent cluster manipulation or damage.
      
      Reviewed-by: ultrotter
      778b75bb
  19. 18 Mar, 2008 1 commit
  20. 05 Mar, 2008 1 commit
  21. 29 Feb, 2008 1 commit
    • Iustin Pop's avatar
      Fix master role stop on cluster destroy · c9064964
      Iustin Pop authored
      Currently the cluster destroy doesn't remove the master role, which
      means that the IP address of the cluster remains assigned to the master
      node.
      
      This patch fixes this and also a docstring in backend.StopMaster().
      
      Reviewed-by: imsnah
      c9064964
  22. 22 Feb, 2008 1 commit