Skip to content
Snippets Groups Projects
  1. Oct 17, 2008
  2. Oct 16, 2008
    • Michael Hanselmann's avatar
      rapi: Convert to new HTTP server class · 16a8967d
      Michael Hanselmann authored
      Requests are no longer logged to a separate file.
      
      Reviewed-by: amishchenko
      16a8967d
    • Iustin Pop's avatar
      Improvements to the master startup checks · d7cdb55d
      Iustin Pop authored
      In order to account for future improvements to master failover, we move
      the actual data gathering capabilities from ganeti-masterd into
      bootstrap.py, and we leave only the verification into masterd.
      
      The verification procedure is then changed to retry multiple times (up
      to one minute) in case most nodes do not respond, and also the algorithm
      is changed to require at least half (but not half+1) votes, since our
      vote also should count (and we vote for ourselves).
      
      Example for consistent (config-wise) cluster:
        - 5 node cluster, 2 nodes down: still start
        - 4 node cluster, 2 nodes down: retry for one minute, abort
      
      Reviewed-by: ultrotter
      d7cdb55d
    • Iustin Pop's avatar
      Add an interface for the drain flag changes/query · 3ccafd0e
      Iustin Pop authored
      This adds the set/reset in the jqueue and luxi modules, and a way to
      query it in OpQueryConfigValues, and also the comand line interface for
      it:
      $ gnt-cluster queue info
      The drain flag is unset
      $ gnt-cluster queue drain
      $ gnt-cluster queue info
      The drain flag is set
      $ gnt-cluster queue undrain
      $ gnt-cluster queue info
      The drain flag is unset
      
      The choice of making the setting via luxi and not an opcode is that
      opcodes can't be executed when drained, but we don't query via luxi
      since in the future it might become a cluster property as opposed to a
      node one.
      
      Reviewed-by: imsnah
      3ccafd0e
  3. Oct 15, 2008
  4. Oct 14, 2008
    • Iustin Pop's avatar
      Export the hypervisor.ValidateParameters over RPC · 6217e295
      Iustin Pop authored
      The newly-added node-specific ValidateParams hypervisor method is
      exported over RPC, using the semi-standard (success, message) return
      value. Multi-node call, so that we call on both primary and secondary at
      once.
      
      Reviewed-by: ultrotter
      6217e295
  5. Oct 13, 2008
    • Iustin Pop's avatar
      Fix a few rpc-related errors · 16ad1a83
      Iustin Pop authored
      This fixes:
        - whitespace change, double lines between methods
        - duplication of call_upload_file, introduced by mistake in rev 1795
          and which went undetected because of the many changes in that ref
          (only diff -b shows it clearly)
        - call_instance_info didn't pass the hypervisor name parameter, but
          the backend requires it
      
      Reviewed-by: ultrotter
      16ad1a83
  6. Oct 12, 2008
    • Iustin Pop's avatar
      Abstract checking own address into a function · caad16e2
      Iustin Pop authored
      Currently, we check if we have a given ip address (i.e. it's alive on
      one of our interfaces) but manually calling TcpPing(source=localhost).
      This works, but having it spread all over the code makes it hard to
      change the implementation.
      
      The patch abstracts this into a separate utils.OwnIpAddress(addr)
      function. We add a rpc call for it, which we use instead of the
      (single-use of) call_node_tcp_ping. We leave node_tcp_ping in, as seems
      useful and eventually it should be removed in a separate patch.
      
      Reviewed-by: imsnah
      caad16e2
  7. Oct 10, 2008
    • Michael Hanselmann's avatar
      Convert ganeti-noded to new HTTP server class · cc28af80
      Michael Hanselmann authored
      Reviewed-by: iustinp
      cc28af80
    • Iustin Pop's avatar
      Convert rpc module to RpcRunner · 72737a7f
      Iustin Pop authored
      This big patch changes the call model used in internode-rpc from
      standalong function calls in the rpc module to via a RpcRunner class,
      that holds all the methods. This can be used in the future to enable
      smarter processing in the RPC layer itself (some quick examples are not
      setting the DiskID from cmdlib code, but only once in each rpc call,
      etc.).
      
      There are a few RPC calls that are made outside of the LU code, and
      these calls are left as staticmethods, so they can be used without a
      class instance (which requires a ConfigWriter instance).
      
      Reviewed-by: imsnah
      72737a7f
  8. Oct 08, 2008
    • Iustin Pop's avatar
      Move the hypervisor attribute to the instances · e69d05fd
      Iustin Pop authored
      This (big) patch moves the hypervisor type from the cluster to the
      instance level; the cluster attribute remains as the default hypervisor,
      and will be renamed accordingly in a next patch. The cluster also gains
      the ‘enable_hypervisors’ attribute, and instances can be created with
      any of the enabled ones (no provision yet for changing that attribute).
      
      The many many changes in the rpc/backend layer are due to the fact that
      all backend code read the hypervisor from the local copy of the config,
      and now we have to send it (either in the instance object, or as a
      separate parameter) for each function.
      
      The node list by default will list the node free/total memory for the
      default hypervisor, a new flag to it should exist to select another
      hypervisor. Instance list has a new field, hypervisor, that shows the
      instance hypervisor. Cluster verify runs for all enabled hypervisor
      types.
      
      The new FIXMEs are related to IAllocator, since now the node
      total/free/used memory counts are wrong (we can't reliably compute the
      free memory).
      
      Reviewed-by: imsnah
      e69d05fd
  9. Oct 07, 2008
    • Iustin Pop's avatar
      rpc.call_instance_migrate: pass the whole instance · 9f0e6b37
      Iustin Pop authored
      Currently the call_instance_migrate call only passes the instance name;
      we need to pass the whole object for the hypervisor_type changes (all
      the other individual instance rpc calls already pass the instance
      object).
      
      Reviewed-by: imsnah
      9f0e6b37
    • Iustin Pop's avatar
      Implement job 'waiting' status · e92376d7
      Iustin Pop authored
      Background: when we have multiple jobs in the queue (more than just a
      few), many of the jobs (up to the number of threads) will be in state
      'running', although many of them could be actually blocked, waiting for
      some locks. This is not good, as one cannot easily see what is
      happening.
      
      The patch extends the opcode/job possible statuses with another one,
      waiting, which shows that the LU is in the acquire locks phase. The
      mechanism for doing so is simple, we initialize (in the job queue) the
      opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
      control to the LU's Exec, it will call a notifier back into the
      _JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
      the proper queue locking). Because this mechanism does not save the job,
      all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
      so we also change the load sequence to consider WAITLOCK as RUNNING.
      
      With the patch applied, creating in parallel (via burnin) five instances
      on a five node cluster shows that only two are executing, while three
      are waiting for locks.
      
      Reviewed-by: imsnah
      e92376d7
  10. Oct 06, 2008
    • Iustin Pop's avatar
      Implement job auto-archiving · 07cd723a
      Iustin Pop authored
      This patch adds a new luxi call that implements auto-archiving of jobs
      older than a certain age (or -1 for all completed jobs), and the gnt-job
      command that makes use of this (with 'all' for -1).
      
      Reviewed-by: imsnah
      07cd723a
    • Iustin Pop's avatar
      backend.py change to get cluster name from master · 62c9ec92
      Iustin Pop authored
      Currently there are three function in backend that need the cluster name
      in order to instantiate an SshRunner. The patch changes these to get the
      cluster name from the master in the rpc call; once the multi-hypervisor
      change is implemented, then very few places in which we need the SCR
      remain in the backend.
      
      Reviewed-by: killerfoxi, imsnah
      62c9ec92
  11. Oct 01, 2008
    • Michael Hanselmann's avatar
      Convert ganeti-master · a42872ff
      Michael Hanselmann authored
      Use simpleconfig instead of ssconf.
      
      Reviewed-by: iustinp
      a42872ff
    • Michael Hanselmann's avatar
      Convert ganeti-watcher · 2859b87b
      Michael Hanselmann authored
      Use RPC calls instead of ssconf.
      
      Reviewed-by: iustinp
      2859b87b
    • Michael Hanselmann's avatar
      Convert ganeti-noded · 8594f271
      Michael Hanselmann authored
      Replace ssconf with utility functions.
      
      Reviewed-by: iustinp
      8594f271
    • Michael Hanselmann's avatar
      Add new query to get cluster config values · ae5849b5
      Michael Hanselmann authored
      This can be used to retrieve certain cluster config values from
      within clients.
      
      OpDumpClusterConfig was not used anywhere, hence I'm just reusing
      it. The way ConfigWriter.DumpConfig returned the configuration
      was not thread-safe, anyway (no deepcopy).
      
      Reviewed-by: iustinp
      ae5849b5
    • Iustin Pop's avatar
      Fix the watcher with down nodes · 37b77b18
      Iustin Pop authored
      The watcher didn't handle the down nodes, fix this by ignoring (in
      secondary node reboot checks) any node that doesn't return a boot id.
      
      Reviewed-by: imsnah
      37b77b18
    • Iustin Pop's avatar
      Fix the watcher not restarting instance bug · b7309a0d
      Iustin Pop authored
      The watcher was using conflicting attributes of the instance:
        - it queried the admin_/oper_state, which are booleans
        - but it compared those to the status (which is a text field)
      
      The code was changed to query the aggregated 'status' field, as that
      will also return indication of node problems, and we can use this only
      one field for all decisions. We still ask for the admin_state field as
      that is needed for the activate disks check (in secondary node restart).
      
      The patch also touches the watcher in some other parts:
        - log exceptions nicer
        - convert a method to @staticmethod
        - remove unused imports
      
      Reviewed-by: imsnah
      b7309a0d
    • Iustin Pop's avatar
      Remove last use of utils.RunCmd from the watcher · 5188ab37
      Iustin Pop authored
      The watcher has one last use of ganeti commands as opposed to sending
      requests via luxi. The patch changes this to use the cli functions.
      
      The patch also has two other changes:
        - fix the docstring for OpVerifyDisks (found out while converting
          this)
        - enable stderr logging on the watcher when “-d” is passes
      
      Reviewed-by: imsnah
      5188ab37
  12. Sep 09, 2008
  13. Sep 05, 2008
  14. Aug 29, 2008
    • Iustin Pop's avatar
      Make WaitForJobChanges deal with long jobs · 5c735209
      Iustin Pop authored
      This patch alters the WaitForJobChanges luxi-RPC call to have a
      configurable timeout, so that the call behaves nicely with long jobs
      that have no update.
      
      We do this by adding a timeout parameter in the RPC call, and returning
      a special constant when the timeout is reached without an update. The
      luxi client will repeatedly call the WaitForJobChanges until it gets a
      real change. The timeout is hardcoded as half the RWTO value.
      
      The patch also removes an unused variable (new_state) from the
      WaitForJobChanges method.
      
      Reviewed-by: imsnah,ultrotter
      5c735209
  15. Aug 27, 2008
    • Michael Hanselmann's avatar
      Make sure that client programs get all messages · 6c5a7090
      Michael Hanselmann authored
      This is a large patch, but I can't figure out how to split it without
      breaking stuff. The old way of getting messages by always getting the
      last one didn't bring all messages to the client if they were added
      too fast, thereby making commands like “gnt-cluster verify” less than
      useful. These changes now introduce some sort a serial number per
      log entry to keep track what message a client already received. They
      also remove the log lock per opcode to make reading log entries thread
      safe.
      
      Reviewed-by: ultrotter
      6c5a7090
  16. Aug 18, 2008
    • Michael Hanselmann's avatar
      Use Linux-specific way to name master socket · 9894ece7
      Michael Hanselmann authored
      By using this Linux-specific way we don't have to care about removing the
      socket file when quitting or starting (after an unclean shutdown). For a
      more detailed description, see the comment in the patch.
      
      Reviewed-by: schreiberal
      9894ece7
  17. Aug 11, 2008
  18. Aug 08, 2008
Loading