1. 10 Apr, 2008 16 commits
    • Guido Trotter's avatar
      Verify: add N+1 Memory redundancy verification · 2b3b6ddd
      Guido Trotter authored
      For every node we check that we can host all the instances it's currently
      secondary for belonging to the same primary. This ensures that if a node fails
      all its instances can fit on their secondary node. The code only works when
      failover is forced to go to the secondary node, and cannot go to an arbitrary
      node in the cluster, which is the case in Ganeti 1.2.
      
      Reviewed-by: iustinp
      
      2b3b6ddd
    • Guido Trotter's avatar
      Verify: save instance config · 26b6af5e
      Guido Trotter authored
      Save the instance config after we queried it in an instance_cfg dict.  This can
      be used later by any function that wants it, without reloading it from the
      configuration module. It will be used for N+1 memory resilience checking.
      
      Reviewed-by: iustinp
      
      26b6af5e
    • Guido Trotter's avatar
      Verify: add more instance information to node_info · 36e7da50
      Guido Trotter authored
      The sisnt-by-pnode field contains all secondary instances of a node, grouped by
      their primary node. This information allows us to see quickly if when a node
      dies some of its instances cannot be started on their secondary node.
      
      Reviewed-by: iustinp
      
      36e7da50
    • Guido Trotter's avatar
      Verify: add instance information to node_info · 93e4c50b
      Guido Trotter authored
      With this patch node_info is changed to store information about which primary
      and secondary instances are configured on a node. This information is useful to
      check memory and disk allocation. A list of non-redundant instances is also
      collected at this stage.
      
      Reviewed-by: iustinp
      
      93e4c50b
    • Guido Trotter's avatar
      Verify: Add and populate node_info dict · 9c9c7d30
      Guido Trotter authored
      During information gathering we collect information from call_node_info, and
      then when we cycle trough the nodes add it into a node_info dict containing a
      node's free memory and disk. This will be useful later to verify that the
      cluster is N+1 redundant. The disk space is saved as well because it can be
      useful for checks about disk space redundancy.
      
      Reviewed-by: iustinp
      
      9c9c7d30
    • Iustin Pop's avatar
      Rework the results of OpDiagnoseOS opcode · 1f9430d6
      Iustin Pop authored
      Currently, the opcode DiagnoseOS is the only opcode that return a
      structure of objects.OS (which is a custom class, and not a simple
      python object) and furthermore all the processing of OS validity across
      nodes is left to the clients of this opcode.
      
      It would be more logical to have this opcode be similar to list
      instances/nodes, in the sense that:
        - it should return a table of results
        - the fields in the table should be selectable
      
      This patch does the above. The possible fields are:
        - name (os name)
        - valid (bool representing validity across all nodes)
        - node_status, which is a complicated structure required for ‘gnt-os
          diagnose’
      
      With this patch, gnt-os list becomes a very simple iteration over the
      list of results, filtering out non-valid ones. gnt-os diagnose is still
      complicated, but no more than before.
      
      The burnin tool has also been modified to work with the modified
      results, and is simpler because of this (it only needs to know if an OS
      is valid or not, not the per-node details).
      
      Reviewed-by: imsnah
      1f9430d6
    • Iustin Pop's avatar
      Change client protocol to raise exception on failures · b77acb3e
      Iustin Pop authored
      Currently the luxi.client.SubmitJob and Query methods return the unserialized
      result without processing it at all. This patch changes this by adding a
      'RequestException' error that is raised if the query itself or the
      submission of the job failed, and (if not) returning only the 'result'
      field from the message.
      
      The patch also does processing on the result of a query if we queried
      for jobs, as the 'op_list' field in the result has serialized opcodes
      and we need the de-serialized.
      
      Reviewed-by: ultrotter
      b77acb3e
    • Iustin Pop's avatar
      Add per-opcode results to job processing · 35049ff2
      Iustin Pop authored
      This patch changes the definition of a job and introduces per-opcode
      results.
      
      First, the result and status fields of a job are condensed into a single
      'status' attribute. Then, we introduce an opcode status and one result
      list, that allow jobs to return values.
      
      The gnt-job script is also modified to allow these new fields to be
      queried.
      
      Note that the patch changes the opcode field to op_list, and it changes
      its return value from string to a list of (serialized) opcodes.
      
      Reviewed-by: ultrotter
      35049ff2
    • Iustin Pop's avatar
      Move the OS search code into an abstract function · 57c177af
      Iustin Pop authored
      Based on the previous OS search code changes, we can now move the OS
      search code into a generic look-for-file function in utils.py. This
      means that the allocator code can use the same function.
      
      Reviewed-by: ultrotter
      57c177af
    • Iustin Pop's avatar
      Change backend._OSSearch return values · c34c0cfd
      Iustin Pop authored
      Currently, the function backend._OSSearch() returns the (first) base dir
      in which this OS can be found. Thereafter the full actual path to the OS
      dir is built in the backend.OSFromDisk() function.
      
      This patch changes this so that _OSSearch() always returns the full path
      to the OS directory, and OSFromDisk uses that as returned (it will only
      build it if it gets a base dir in the first place).
      
      This patch is needed before we can abstract the _OSSearch into a generic
      'look for file object' functionality that can be used for allocator
      plugins search too.
      
      Reviewed-by: ultrotter
      c34c0cfd
    • Guido Trotter's avatar
      Verify: remove useless check in _VerifyInstance · ceb76b36
      Guido Trotter authored
      The list of instances passed to _VerifyInstance is the one coming from
      self.cfg.GetInstanceList(). So there's no point, inside that function, in
      checking whether the current instance is a member of that list. Moreover
      orphaned instance verification is already done in a separate step.
      
      Reviewed-by: imsnah
      
      ceb76b36
    • Guido Trotter's avatar
      Verify: instance verification cleanup · c5705f58
      Guido Trotter authored
      The instance configuration is grabbed both in the _VerifyInstance function and
      in the loop that calls it. Clean this up by passing the configuration as a
      parameter.
      
      Reviewed-by: imsnah
      
      c5705f58
    • Guido Trotter's avatar
      Verify: fix crash when a node is down · a872dae6
      Guido Trotter authored
      Currently if ganeti-noded doesn't respond on a node gnt-cluster verify will die
      when verifying primary instances for that node. Fix this by just emitting an
      error message if no information about running instances is returned from the
      node.
      
      Reviewed-by: iustinp
      
      a872dae6
    • Guido Trotter's avatar
      Verify: fix ERROR message indentation · c840ae6f
      Guido Trotter authored
      All ERROR messages in cluster verify are indented by four spaces, this one is
      indented by two. Fixing this skew.
      
      Reviewed-by: imsnah, iustinp
      
      c840ae6f
    • Guido Trotter's avatar
      Fix spelling mistake in constants.py · 2f6eebee
      Guido Trotter authored
      Of course instance creation don't have any modem, and the comment was just
      talking about modes. Sorry to everybody expecting whistles.
      
      Reviewed-by: imsnah
      
      2f6eebee
    • Manuel Franceschini's avatar
      Small code style fix · 16687b98
      Manuel Franceschini authored
      Reviewed-by: imsnah
      16687b98
  2. 09 Apr, 2008 1 commit
  3. 08 Apr, 2008 6 commits
    • Manuel Franceschini's avatar
      Two small code style fixes · 1c6e3627
      Manuel Franceschini authored
      Reviewed-by: imsnah
      1c6e3627
    • Manuel Franceschini's avatar
      Add file_storage_dir,file_driver to OpCreateInstance · dc936b49
      Manuel Franceschini authored
      Reviewed-by: ultrotter, iustinp
      dc936b49
    • Manuel Franceschini's avatar
      Modify LURenameInstance to support file backend · b23c4333
      Manuel Franceschini authored
      This patch does two things:
      - Modify LURenameInstance.Exec to rename directory
        when a file-based instance is renamed
      - Modify config.RenameInstance() to replace the directory name in
        config.data for file devices
      
      Reviewed-by: iustinp
      b23c4333
    • Manuel Franceschini's avatar
      Modify LUCreateInstance to support file backend · 0f1a06e3
      Manuel Franceschini authored
      - Modfiy _GenerateDiskTemplate to support file-based disk template
      - Modify _CreateDisks to create directory needed for file-based
        instances before creating the actual files
      - Modify _RemoveDisks to delete directory for file-based instances
        after deleting their VBDs
      - Add Prereq-check to check if given file-driver is valid
      - Add Prereq-check to check if given file-storage-dir path is relative
      
      Reviewed-by: iustinp
      0f1a06e3
    • Michael Hanselmann's avatar
      Provide more flexible version numbers to the code · d5fd92ed
      Michael Hanselmann authored
      Having the individual parts in the code allows us to build version
      numbers like "1.2" while leaving "3" out in a clean fashion, that is
      without regular expressions or the like. This might be used together
      with configuration format versions.
      
      Why m4 code? AM_INIT_AUTOMAKE, which could take a shell variable, is
      considered deprecated[1] and should be replaced by AC_INIT. Unfortunately,
      AC_INIT is expanded at build time, so one has to use m4 to build
      composite values like this version number[2].
      
      [1] http://www.gnu.org/software/libtool/manual/automake/Public-macros.html
      [2] http://www.mail-archive.com/autoconf@gnu.org/msg16720.html
      
      Reviewed-by: iustinp
      d5fd92ed
    • Manuel Franceschini's avatar
      Modify hypervisor to support file backend · e994fcba
      Manuel Franceschini authored
      The driver in the xen config file needs to be changed when dealing with
      files rather then bdevs.
      
      This patch does two things:
      - Add _GetConfigFileDiskData to XenHypervisor which returns the correct
        disk xen config line. This function checks the logical disk type of
        every given block device, such that also hybrid (e.g. mixed drbd and
        file VBDs) are possible
      - Make Xen[Pvm|Hvm]Hypervisor._WriteConfigFile() a classmethod to be
        able to call the helper function _GetConfigFileDiskData() in their
        parent XenHypervisor
      
      Reviewed-by: iustinp
      e994fcba
  4. 07 Apr, 2008 3 commits
    • Iustin Pop's avatar
      Implement selective job query · 283439c9
      Iustin Pop authored
      This patch implements query-ing of only selected jobs instead of all.
      
      Reviewed-by: ultrotter
      283439c9
    • Iustin Pop's avatar
      Move some checks from cli.py to luxi.py · a14a17fc
      Iustin Pop authored
      The idea of cli.py and luxi.py is that all protocol checks should be in
      luxi, and cli.py should just offer some helpful shortcuts for the
      command line scripts.
      
      This patch removes the result checks from cli and adds some other checks
      to luxi. It does no longer check the success/failure since it's not yet
      clear how that should be handled - probably exceptions.
      
      Reviewed-by: ultrotter
      a14a17fc
    • Iustin Pop's avatar
      A small capitalization change (OpCode.LoadOpcode) · 00abdc96
      Iustin Pop authored
      This small patch fixed the opcodes.OpCode.LoadOpcode capitalization to
      what was intented to be (as the comment says): LoadOpCode.
      
      Reviewed-by: ultrotter
      00abdc96
  5. 05 Apr, 2008 4 commits
    • Iustin Pop's avatar
      Implement forking/master role checking in masterd · c1f2901b
      Iustin Pop authored
      This patch adds checks for the master role and daemonize support to
      ganeti-masterd.
      
      The patch modifies the startup/shutdown of the server because:
        - we want bind()/listen() to the master socket to occur before forking
          so that we can return a correct exit code and write messages to
          stderr
        - but we want thread startup to occur after fork(), otherwise python
          threading gets confused
      
      The patch also has some small cleanups:
        - remove the unix socket after closing it, so we don't need to remove
          it manually
        - instead of just telling the threads to terminate via the new_queue,
          we also join() them so that the logs show what thread clinging to
          life
        - the daemon logs to its own logfile now
        - there is command line parameter support :)
      
      Reviewed-by: imsnah
      c1f2901b
    • Manuel Franceschini's avatar
      Add FileStorage class · 6f695a2e
      Manuel Franceschini authored
      This is the representation of file VBDs on the backend. It's the first
      implementation an supports only raw files.
      
      Reviewed-by: iustinp
      6f695a2e
    • Manuel Franceschini's avatar
      rpc directory functions for file backend · 5e04ed8b
      Manuel Franceschini authored
      Reviewed-by: ultrotter
      5e04ed8b
    • Manuel Franceschini's avatar
      Backend directory functions for file backend · 778b75bb
      Manuel Franceschini authored
      Add _[Create,Remove,Rename]FileStorageDir function which are needed for
      file-based instance management. These function check whether the given
      directory to operate on is under the cluster-wide defined default file
      storage dir. If this is not the case the won't do anything and return
      False. This is to prevent cluster manipulation or damage.
      
      Reviewed-by: ultrotter
      778b75bb
  6. 04 Apr, 2008 4 commits
    • Manuel Franceschini's avatar
      Fix SetVGName() to access object not dict · 2d4011cd
      Manuel Franceschini authored
      Reviewed-by: imsnah
      2d4011cd
    • Iustin Pop's avatar
      Allow utils.Daemonize() to not close some fds · 8ff612c2
      Iustin Pop authored
      This patch implements an optional parameter to utils.Daemonize() which
      allows that function to not close some file descriptors.
      
      This will allow the master daemon to open the listening socket before
      fork - in order to be able to notify errors and return a meaningful exit
      code, and then when we fork we don't close that fd.
      
      Reviewed-by: imsnah
      8ff612c2
    • Iustin Pop's avatar
      Add a simple gnt-job script · 7a1ecaed
      Iustin Pop authored
      This patch adds a very basic gnt-job script that allows job querying.
      This goes on top of the previous master daemon patches.
      
      Currently, because of the not-changed cmd lock, you can't query the jobs
      as long as a job is running - you have to rm the cmd lock and then you
      can query the jobs.
      
      Reviewed-by: imsnah
      7a1ecaed
    • Iustin Pop's avatar
      Move the daemonize function to utils.py · 8f765069
      Iustin Pop authored
      Currently, in ganeti-noded we have the createDaemon function. Since
      we'll need the same in other daemons, we move this function to utils.py
      
      With the move, a few changes were also done:
        - change the name to Daemonize()
        - add a parameter, logfile, as different daemons will want to log to
          different files
        - remove the try.. except.. around the fork calls, since they were
          only re-raising the OS exception with less data; unless we want to
          actually handle fork error (not just re-raising), these try blocks
          are not useful
        - change the return style at the end of the function
      
      Reviewed-by: imsnah
      8f765069
  7. 02 Apr, 2008 6 commits
    • Guido Trotter's avatar
      Improve disk consistency error message again · aa9d0c32
      Guido Trotter authored
      This new version includes all the possible failure options.
      
      Reviewed-by: iustinp
      
      aa9d0c32
    • Guido Trotter's avatar
      Fix misleading error message when checking disks · ad6d3f7d
      Guido Trotter authored
      _CheckDiskConsistency outputs "Can't get any data from node NODE" when no drbd
      is found on the target node. This causes a misleading error message to be
      output for example on failover (when the primary node is down, or the instance
      is not running), stating that no data could be got from the secondary node,
      which scares the user and misleads him. Changing this to "Disk degraded or not
      found on node %s" still reports that something is missing, but on the other
      hand doesn't make the user think the node is down, or has no data at all...
      
      Reviewed-by: imsnah
      
      ad6d3f7d
    • Guido Trotter's avatar
      Handle better failing over non-running instances · a0aaa0d0
      Guido Trotter authored
      Right now if you try to failover an instance which is not marked as up the
      operation will fail unless you pass the --ignore-consistency flag because the
      disks won't be considered to be consistent. Allow them to be if we know the
      instance shouldn't be up.
      
      Reviewed-by: imsnah
      
      a0aaa0d0
    • Guido Trotter's avatar
      Improve export and fix export-on-norun bug · fb300fb7
      Guido Trotter authored
      Currently gnt-backup export chains the ShutdownInstance and StartupInstance
      opcodes to itself. This works but (a) it's suboptimal, because there's no need
      to deactivate the instance's disks as we are about to restart it anyway, and
      (b) doesn't take care of instances which are already down (and should be). This
      patch takes care of this by just calling the shutdown rpc function instead of
      the whole opcode, and just starting up the instance if it's configured as up in
      the first place.
      
      Reviewed-by: imsnah
      
      fb300fb7
    • Michael Hanselmann's avatar
      Forcibly convert export data to str object · 2d3e73c4
      Michael Hanselmann authored
      ConfigParser.SafeConfigParser doesn't support unicode string objects.
      Unicode string objects are returned by simplejson.
      
      Reviewed-by: iustinp
      2d3e73c4
    • Guido Trotter's avatar
      failover: only start instance if we should · 12a0cfbe
      Guido Trotter authored
      gnt-instance failover on an instance marked as down will mistakenly bring it
      up. The watcher will then shut it down again, but it's a lot better (and safer)
      not to start it at all.
      
      Reviewed-by: imsnah
      12a0cfbe