1. 08 Oct, 2008 1 commit
    • Iustin Pop's avatar
      Move the hypervisor attribute to the instances · e69d05fd
      Iustin Pop authored
      This (big) patch moves the hypervisor type from the cluster to the
      instance level; the cluster attribute remains as the default hypervisor,
      and will be renamed accordingly in a next patch. The cluster also gains
      the ‘enable_hypervisors’ attribute, and instances can be created with
      any of the enabled ones (no provision yet for changing that attribute).
      
      The many many changes in the rpc/backend layer are due to the fact that
      all backend code read the hypervisor from the local copy of the config,
      and now we have to send it (either in the instance object, or as a
      separate parameter) for each function.
      
      The node list by default will list the node free/total memory for the
      default hypervisor, a new flag to it should exist to select another
      hypervisor. Instance list has a new field, hypervisor, that shows the
      instance hypervisor. Cluster verify runs for all enabled hypervisor
      types.
      
      The new FIXMEs are related to IAllocator, since now the node
      total/free/used memory counts are wrong (we can't reliably compute the
      free memory).
      
      Reviewed-by: imsnah
      e69d05fd
  2. 07 Oct, 2008 2 commits
    • Iustin Pop's avatar
      rpc.call_instance_migrate: pass the whole instance · 9f0e6b37
      Iustin Pop authored
      Currently the call_instance_migrate call only passes the instance name;
      we need to pass the whole object for the hypervisor_type changes (all
      the other individual instance rpc calls already pass the instance
      object).
      
      Reviewed-by: imsnah
      9f0e6b37
    • Iustin Pop's avatar
      Implement job 'waiting' status · e92376d7
      Iustin Pop authored
      Background: when we have multiple jobs in the queue (more than just a
      few), many of the jobs (up to the number of threads) will be in state
      'running', although many of them could be actually blocked, waiting for
      some locks. This is not good, as one cannot easily see what is
      happening.
      
      The patch extends the opcode/job possible statuses with another one,
      waiting, which shows that the LU is in the acquire locks phase. The
      mechanism for doing so is simple, we initialize (in the job queue) the
      opcode with OP_STATUS_WAITLOCK, and when the processor is ready to give
      control to the LU's Exec, it will call a notifier back into the
      _JobQueueWorker that sets the opcode status to OP_STATUS_RUNNING (with
      the proper queue locking). Because this mechanism does not save the job,
      all opcodes on disk will be in status WAITLOCK and not RUNNING anymore,
      so we also change the load sequence to consider WAITLOCK as RUNNING.
      
      With the patch applied, creating in parallel (via burnin) five instances
      on a five node cluster shows that only two are executing, while three
      are waiting for locks.
      
      Reviewed-by: imsnah
      e92376d7
  3. 06 Oct, 2008 2 commits
    • Iustin Pop's avatar
      Implement job auto-archiving · 07cd723a
      Iustin Pop authored
      This patch adds a new luxi call that implements auto-archiving of jobs
      older than a certain age (or -1 for all completed jobs), and the gnt-job
      command that makes use of this (with 'all' for -1).
      
      Reviewed-by: imsnah
      07cd723a
    • Iustin Pop's avatar
      backend.py change to get cluster name from master · 62c9ec92
      Iustin Pop authored
      Currently there are three function in backend that need the cluster name
      in order to instantiate an SshRunner. The patch changes these to get the
      cluster name from the master in the rpc call; once the multi-hypervisor
      change is implemented, then very few places in which we need the SCR
      remain in the backend.
      
      Reviewed-by: killerfoxi, imsnah
      62c9ec92
  4. 01 Oct, 2008 7 commits
    • Michael Hanselmann's avatar
      Convert ganeti-master · a42872ff
      Michael Hanselmann authored
      Use simpleconfig instead of ssconf.
      
      Reviewed-by: iustinp
      a42872ff
    • Michael Hanselmann's avatar
      Convert ganeti-watcher · 2859b87b
      Michael Hanselmann authored
      Use RPC calls instead of ssconf.
      
      Reviewed-by: iustinp
      2859b87b
    • Michael Hanselmann's avatar
      Convert ganeti-noded · 8594f271
      Michael Hanselmann authored
      Replace ssconf with utility functions.
      
      Reviewed-by: iustinp
      8594f271
    • Michael Hanselmann's avatar
      Add new query to get cluster config values · ae5849b5
      Michael Hanselmann authored
      This can be used to retrieve certain cluster config values from
      within clients.
      
      OpDumpClusterConfig was not used anywhere, hence I'm just reusing
      it. The way ConfigWriter.DumpConfig returned the configuration
      was not thread-safe, anyway (no deepcopy).
      
      Reviewed-by: iustinp
      ae5849b5
    • Iustin Pop's avatar
      Fix the watcher with down nodes · 37b77b18
      Iustin Pop authored
      The watcher didn't handle the down nodes, fix this by ignoring (in
      secondary node reboot checks) any node that doesn't return a boot id.
      
      Reviewed-by: imsnah
      37b77b18
    • Iustin Pop's avatar
      Fix the watcher not restarting instance bug · b7309a0d
      Iustin Pop authored
      The watcher was using conflicting attributes of the instance:
        - it queried the admin_/oper_state, which are booleans
        - but it compared those to the status (which is a text field)
      
      The code was changed to query the aggregated 'status' field, as that
      will also return indication of node problems, and we can use this only
      one field for all decisions. We still ask for the admin_state field as
      that is needed for the activate disks check (in secondary node restart).
      
      The patch also touches the watcher in some other parts:
        - log exceptions nicer
        - convert a method to @staticmethod
        - remove unused imports
      
      Reviewed-by: imsnah
      b7309a0d
    • Iustin Pop's avatar
      Remove last use of utils.RunCmd from the watcher · 5188ab37
      Iustin Pop authored
      The watcher has one last use of ganeti commands as opposed to sending
      requests via luxi. The patch changes this to use the cli functions.
      
      The patch also has two other changes:
        - fix the docstring for OpVerifyDisks (found out while converting
          this)
        - enable stderr logging on the watcher when “-d” is passes
      
      Reviewed-by: imsnah
      5188ab37
  5. 09 Sep, 2008 4 commits
  6. 05 Sep, 2008 1 commit
  7. 29 Aug, 2008 1 commit
    • Iustin Pop's avatar
      Make WaitForJobChanges deal with long jobs · 5c735209
      Iustin Pop authored
      This patch alters the WaitForJobChanges luxi-RPC call to have a
      configurable timeout, so that the call behaves nicely with long jobs
      that have no update.
      
      We do this by adding a timeout parameter in the RPC call, and returning
      a special constant when the timeout is reached without an update. The
      luxi client will repeatedly call the WaitForJobChanges until it gets a
      real change. The timeout is hardcoded as half the RWTO value.
      
      The patch also removes an unused variable (new_state) from the
      WaitForJobChanges method.
      
      Reviewed-by: imsnah,ultrotter
      5c735209
  8. 27 Aug, 2008 1 commit
    • Michael Hanselmann's avatar
      Make sure that client programs get all messages · 6c5a7090
      Michael Hanselmann authored
      This is a large patch, but I can't figure out how to split it without
      breaking stuff. The old way of getting messages by always getting the
      last one didn't bring all messages to the client if they were added
      too fast, thereby making commands like “gnt-cluster verify” less than
      useful. These changes now introduce some sort a serial number per
      log entry to keep track what message a client already received. They
      also remove the log lock per opcode to make reading log entries thread
      safe.
      
      Reviewed-by: ultrotter
      6c5a7090
  9. 18 Aug, 2008 1 commit
    • Michael Hanselmann's avatar
      Use Linux-specific way to name master socket · 9894ece7
      Michael Hanselmann authored
      By using this Linux-specific way we don't have to care about removing the
      socket file when quitting or starting (after an unclean shutdown). For a
      more detailed description, see the comment in the patch.
      
      Reviewed-by: schreiberal
      9894ece7
  10. 11 Aug, 2008 1 commit
  11. 08 Aug, 2008 6 commits
  12. 07 Aug, 2008 1 commit
  13. 06 Aug, 2008 6 commits
  14. 31 Jul, 2008 1 commit
  15. 30 Jul, 2008 4 commits
    • Iustin Pop's avatar
      Unify SetupDaemon/SetupLogging · 59f187eb
      Iustin Pop authored
      The 'old-style' info, error, debug logs do not make much sense. This
      patch unifies the SetupLogging and SetupDaemon functions. As a result,
      all the commands logs to a 'commands.log' file.
      
      The patch also changes the log setup to keep going if there's an error
      in setting up the file logging but we're logging to stderr.
      
      Also, burnin now logs to its own file (burnin.log).
      
      Reviewed-by: ultrotter
      59f187eb
    • Iustin Pop's avatar
      Rework master startup/shutdown/failover · b1b6ea87
      Iustin Pop authored
      This (big) patch reworks the master startup/shutdown and the fixes the
      master failover.
      
      What does the patch do?
      
      For master start/stop:
        - remove the old ganeti-master script and its associated man page
        - moves the ip start/stop directly into the backend.(Start|Stop)Master
        - adds start/stop of the master/rapi daemon into these functions,
          selectively based on the start/stop arguments
        - makes the master call via rpc StartMaster(start_daemons=False) to
          the local node so that the master IP is started
        - and finally changes the example init.d script to directly start and
          stop all three daemons, since they do the right thing (depending on
          master/not master role)
      
      For master failover:
        - moves the code from LUMasterFailover into bootstrap.MasterFailover,
          since we need to start/stop the master during this operation and
          thus it can't be executed from the master
        - removes the LUMasterFailover and its associated opcode
      
      Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
      master' are not seen during startup on non-master nodes.
      
      Reviewed-by: ultrotter
      b1b6ea87
    • Iustin Pop's avatar
      Implement checking for the master role in rapi · 5675cd1f
      Iustin Pop authored
      This patch moves the CheckMaster function from ganeti-masterd to ssconf
      (most logical place, it cannot go in utils since we would have recursive
      imports between ssconf and utils) and changes ganeti-rapi to also call
      this function.
      
      This is needed so that starting ganeti-rapi on a non-master node does
      the right thing.
      
      Reviewed-by: ultrotter
      5675cd1f
    • Iustin Pop's avatar
      Add a new parameter to backend.(Start|Stop)Master · 1c65840b
      Iustin Pop authored
      This patch adds a new, unused for now, parameter to the start and stop
      master operations in backend. The idea behind it is that we need to be
      able to control whether the IP (de)activation is coupled with daemon
      startup/shutdown.
      
      The callers are also modified to pass this parameter (even if unused for
      now).
      
      Reviewed-by: ultrotter
      1c65840b
  16. 29 Jul, 2008 1 commit