Skip to content
Snippets Groups Projects
  1. Dec 05, 2008
    • Iustin Pop's avatar
      Cleanup the config file on demotion from candidate · 56aa9fd5
      Iustin Pop authored
      This patch adds a simple rpc which makes a backup of the config file and
      then removes it. This is done so that cluster verify doesn't complain
      immediately after demoting a node.
      
      Reviewed-by: imsnah
      56aa9fd5
    • Iustin Pop's avatar
      watcher: handle offline nodes better · cbfc4681
      Iustin Pop authored
      This patch changes the LUQueryInstances to show a different state for
      offline nodes and also modifies the watcher to understand the offline
      state in its checks.
      
      Reviewed-by: ultrotter
      cbfc4681
  2. Dec 04, 2008
  3. Dec 02, 2008
    • Iustin Pop's avatar
      Fix master failover · bbe19c17
      Iustin Pop authored
      The ssconf files were not updated by the master failover. We need to
      push them, and since we already have RPC initialized, we can use the
      standard ConfigWriter to do so - this will take care of both the config
      file and the ssconf files.
      
      Reviewed-by: imsnah
      bbe19c17
  4. Nov 26, 2008
  5. Nov 25, 2008
  6. Nov 21, 2008
  7. Nov 11, 2008
    • Iustin Pop's avatar
      Abstract runtime creation of dirs into a function · 8adbffaa
      Iustin Pop authored
      Currently the dir creation in ganeti-noded is in the main function. This
      is not nice: we move it into a separate function and also add creation
      of the OS_LOG_DIR (with different permissions, but in the same way).
      This will permit cleanup of the creation of the OS_LOG_DIR from the
      backend module (it's done multiple places currently).
      
      Reviewed-by: imsnah
      8adbffaa
  8. Oct 24, 2008
  9. Oct 23, 2008
    • Iustin Pop's avatar
      Export the disk index in the import/export scripts · 74c47259
      Iustin Pop authored
      We want to export the disk index as some OSes will only want to export
      the first disk (or the second one, etc.), even if we have multiple
      disks.
      
      The patch also updates the backend.ExportSnapshot docstring.
      
      Reviewed-by: ultrotter
      74c47259
  10. Oct 22, 2008
    • Guido Trotter's avatar
      Convert ImportOSIntoInstance to OS API 10 · 6c0af70e
      Guido Trotter authored
      - Change ImportOSIntoInstance not to get any "os_disk" and "swap_disk"
        arguments but to accept multiple target images to import, and to
        return a list of booleans with the result of each import
      - Change the relevant rpc call and the only caller to conform
      - Pass arguments to the import script through the environment
      - Run one import os script for each disk image, passing an IMPORT_DEVICE
      
      Reviewed-by: iustinp
      6c0af70e
  11. Oct 21, 2008
  12. Oct 20, 2008
    • Iustin Pop's avatar
      Convert the job queue rpcs to address-based · 99aabbed
      Iustin Pop authored
      The two main multi-node job queue RPC calls (jobqueue_update,
      jobqueue_rename) are converted to address-based calls, in order to speed
      up queue changes. For this, we need to change the _nodes attribute on
      the jobqueue to be a dict {name: ip}, instead of a set.
      
      Reviewed-by: imsnah
      99aabbed
    • Iustin Pop's avatar
      Remove the logger.py module · 82d9caef
      Iustin Pop authored
      Since now we use only one function from the logger module
      (SetupLogging), we move it to utils.py (which is already imported by all
      users of this function), and we remove the module.
      
      Reviewed-by: imsnah
      82d9caef
  13. Oct 17, 2008
  14. Oct 16, 2008
    • Michael Hanselmann's avatar
      rapi: Convert to new HTTP server class · 16a8967d
      Michael Hanselmann authored
      Requests are no longer logged to a separate file.
      
      Reviewed-by: amishchenko
      16a8967d
    • Iustin Pop's avatar
      Improvements to the master startup checks · d7cdb55d
      Iustin Pop authored
      In order to account for future improvements to master failover, we move
      the actual data gathering capabilities from ganeti-masterd into
      bootstrap.py, and we leave only the verification into masterd.
      
      The verification procedure is then changed to retry multiple times (up
      to one minute) in case most nodes do not respond, and also the algorithm
      is changed to require at least half (but not half+1) votes, since our
      vote also should count (and we vote for ourselves).
      
      Example for consistent (config-wise) cluster:
        - 5 node cluster, 2 nodes down: still start
        - 4 node cluster, 2 nodes down: retry for one minute, abort
      
      Reviewed-by: ultrotter
      d7cdb55d
    • Iustin Pop's avatar
      Add an interface for the drain flag changes/query · 3ccafd0e
      Iustin Pop authored
      This adds the set/reset in the jqueue and luxi modules, and a way to
      query it in OpQueryConfigValues, and also the comand line interface for
      it:
      $ gnt-cluster queue info
      The drain flag is unset
      $ gnt-cluster queue drain
      $ gnt-cluster queue info
      The drain flag is set
      $ gnt-cluster queue undrain
      $ gnt-cluster queue info
      The drain flag is unset
      
      The choice of making the setting via luxi and not an opcode is that
      opcodes can't be executed when drained, but we don't query via luxi
      since in the future it might become a cluster property as opposed to a
      node one.
      
      Reviewed-by: imsnah
      3ccafd0e
  15. Oct 15, 2008
  16. Oct 14, 2008
    • Iustin Pop's avatar
      Export the hypervisor.ValidateParameters over RPC · 6217e295
      Iustin Pop authored
      The newly-added node-specific ValidateParams hypervisor method is
      exported over RPC, using the semi-standard (success, message) return
      value. Multi-node call, so that we call on both primary and secondary at
      once.
      
      Reviewed-by: ultrotter
      6217e295
  17. Oct 13, 2008
    • Iustin Pop's avatar
      Fix a few rpc-related errors · 16ad1a83
      Iustin Pop authored
      This fixes:
        - whitespace change, double lines between methods
        - duplication of call_upload_file, introduced by mistake in rev 1795
          and which went undetected because of the many changes in that ref
          (only diff -b shows it clearly)
        - call_instance_info didn't pass the hypervisor name parameter, but
          the backend requires it
      
      Reviewed-by: ultrotter
      16ad1a83
  18. Oct 12, 2008
    • Iustin Pop's avatar
      Abstract checking own address into a function · caad16e2
      Iustin Pop authored
      Currently, we check if we have a given ip address (i.e. it's alive on
      one of our interfaces) but manually calling TcpPing(source=localhost).
      This works, but having it spread all over the code makes it hard to
      change the implementation.
      
      The patch abstracts this into a separate utils.OwnIpAddress(addr)
      function. We add a rpc call for it, which we use instead of the
      (single-use of) call_node_tcp_ping. We leave node_tcp_ping in, as seems
      useful and eventually it should be removed in a separate patch.
      
      Reviewed-by: imsnah
      caad16e2
  19. Oct 10, 2008
    • Michael Hanselmann's avatar
      Convert ganeti-noded to new HTTP server class · cc28af80
      Michael Hanselmann authored
      Reviewed-by: iustinp
      cc28af80
    • Iustin Pop's avatar
      Convert rpc module to RpcRunner · 72737a7f
      Iustin Pop authored
      This big patch changes the call model used in internode-rpc from
      standalong function calls in the rpc module to via a RpcRunner class,
      that holds all the methods. This can be used in the future to enable
      smarter processing in the RPC layer itself (some quick examples are not
      setting the DiskID from cmdlib code, but only once in each rpc call,
      etc.).
      
      There are a few RPC calls that are made outside of the LU code, and
      these calls are left as staticmethods, so they can be used without a
      class instance (which requires a ConfigWriter instance).
      
      Reviewed-by: imsnah
      72737a7f
  20. Oct 08, 2008
    • Iustin Pop's avatar
      Move the hypervisor attribute to the instances · e69d05fd
      Iustin Pop authored
      This (big) patch moves the hypervisor type from the cluster to the
      instance level; the cluster attribute remains as the default hypervisor,
      and will be renamed accordingly in a next patch. The cluster also gains
      the ‘enable_hypervisors’ attribute, and instances can be created with
      any of the enabled ones (no provision yet for changing that attribute).
      
      The many many changes in the rpc/backend layer are due to the fact that
      all backend code read the hypervisor from the local copy of the config,
      and now we have to send it (either in the instance object, or as a
      separate parameter) for each function.
      
      The node list by default will list the node free/total memory for the
      default hypervisor, a new flag to it should exist to select another
      hypervisor. Instance list has a new field, hypervisor, that shows the
      instance hypervisor. Cluster verify runs for all enabled hypervisor
      types.
      
      The new FIXMEs are related to IAllocator, since now the node
      total/free/used memory counts are wrong (we can't reliably compute the
      free memory).
      
      Reviewed-by: imsnah
      e69d05fd
  21. Oct 07, 2008
    • Iustin Pop's avatar
      rpc.call_instance_migrate: pass the whole instance · 9f0e6b37
      Iustin Pop authored
      Currently the call_instance_migrate call only passes the instance name;
      we need to pass the whole object for the hypervisor_type changes (all
      the other individual instance rpc calls already pass the instance
      object).
      
      Reviewed-by: imsnah
      9f0e6b37
    • Iustin Pop's avatar
      Implement job 'waiting' status · e92376d7
      Iustin Pop authored
      Background: when we have multiple jobs in the queue (more than just a
      few), many of the jobs (up to the number of threads) will be in state
      'running', although many of them could be actually blocked, waiting for
      some locks. This is not good, as one cannot easily see what is
      happening.
      
      The patch extends the opcode/job possible statuses with another one,
      waiting, which shows that the LU is in the acquire locks phase. The
      mechanism for doing so is simple, we initialize (in the job queue) the
      opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
      control to the LU's Exec, it will call a notifier back into the
      _JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
      the proper queue locking). Because this mechanism does not save the job,
      all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
      so we also change the load sequence to consider WAITLOCK as RUNNING.
      
      With the patch applied, creating in parallel (via burnin) five instances
      on a five node cluster shows that only two are executing, while three
      are waiting for locks.
      
      Reviewed-by: imsnah
      e92376d7
  22. Oct 06, 2008
    • Iustin Pop's avatar
      Implement job auto-archiving · 07cd723a
      Iustin Pop authored
      This patch adds a new luxi call that implements auto-archiving of jobs
      older than a certain age (or -1 for all completed jobs), and the gnt-job
      command that makes use of this (with 'all' for -1).
      
      Reviewed-by: imsnah
      07cd723a
Loading