Skip to content
Snippets Groups Projects
  1. Jul 19, 2009
    • Iustin Pop's avatar
      job queue: fix interrupted job processing · f6424741
      Iustin Pop authored
      
      If a job with more than one opcodes is being processed, and the master
      daemon crashes between two opcodes, we have the first N opcodes marked
      successful, and the rest marked as queued. This means that the overall
      jbo status is queued, and thus on master daemon restart it will be
      resent for completion.
      
      However, the RunTask() function in jqueue.py doesn't deal with
      partially-completed jobs. This patch makes it simply skip such opcodes.
      
      An alternative option would be to not mark partially-completed jobs as
      QUEUED but instead RUNNING, which would result in aborting of the job at
      restart time.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      f6424741
    • Iustin Pop's avatar
      Fix an error path in job queue worker's RunTask · ed21712b
      Iustin Pop authored
      
      In case the job fails, we try to set the job's run_op_idx to -1.
      However, this is a wrong variable, which wasn't detected until the
      __slots__ addition. The correct variable is run_op_index.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      ed21712b
  2. Jul 17, 2009
  3. Jul 16, 2009
  4. Jul 14, 2009
  5. Jul 13, 2009
  6. Jul 08, 2009
  7. Jul 07, 2009
  8. Jul 01, 2009
  9. Jun 30, 2009
    • Iustin Pop's avatar
      Cleanup config data when draining nodes · dec0d9da
      Iustin Pop authored
      
      Currently, when draining nodes we reset their master candidate flag, but
      we don't instruct them to demote themselves. This leads to “ERROR: file
      '/var/lib/ganeti/config.data' should not exist on non master candidates
      (and the file is outdated)”.
      
      This patch simply adds a call to node_demote_from_mc in this case.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      dec0d9da
    • Iustin Pop's avatar
      Fix node readd issues · a8ae3eb5
      Iustin Pop authored
      
      This patch fixes a few node readd issues.
      
      Currently, the node readd consists of two opcodes:
        - OpSetNodeParms, which resets the offline/drained flags
        - OpAddNode (with readd=True), which reconfigures the node
      
      The problem is that between these two, the configuration is inconsistent
      for certain cluster configurations. Thus, this patch removes the first
      opcode and modified the LUAddNode to deal with this case too.
      
      The patch also modifies the computation of the intended master_candidate
      status, and actually sets the readded node to master candidate if
      needed. Previously, we didn't modify the existing node at all.
      
      Finally, the patch modifies the bottom of the Exec() function for this
      LU to:
        - trigger a node update, which in turn redistributes the ssconf files
          to all nodes (and thus the new node too)
        - if the new node is not a master candidate, then call the
          node_demote_from_mc RPC so that old master files are cleared
      
      My testing shows this behaves correctly for various cases.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a8ae3eb5
    • Iustin Pop's avatar
      backend.DemoteFromMC: don't fail for missing files · 9a5cb537
      Iustin Pop authored
      
      If the config file is missing when the DemoteFromMC() function is
      called, it will raise a ProgrammerError. Instead of changing the
      utils.CreateBackup() file which is called from multiple places, for now
      we only change the DemoteFromMC() function to not call it if the file is
      not existing (we rely on the master to prevent race conditions here).
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      9a5cb537
    • Iustin Pop's avatar
      Allow GetMasterCandidateStats to ignore some nodes · 23f06b2b
      Iustin Pop authored
      
      This patch modifies ConfigWriter.GetMasterCandidateStats to allow it to
      ignore some nodes in the calculation, so that we can use it to predict
      cluster state without some nodes (which we know we will modify, and thus
      we should not rely on their state).
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      23f06b2b
    • Iustin Pop's avatar
      Fix error message for extra files on non MC nodes · e631cb25
      Iustin Pop authored
      
      Currently the message for extraneous files on non master candidates is
      confusing, to say the least. This makes it hopefully more clear.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarOlivier Tharan <olive@google.com>
      e631cb25
  10. Jun 29, 2009
  11. Jun 23, 2009
  12. Jun 17, 2009
    • Iustin Pop's avatar
      Fix handling of 'vcpus' in instance list · c1ce76bb
      Iustin Pop authored
      
      Currently running “gnt-instance list -o+vcpus” fails with a cryptic message:
        Unhandled Ganeti error: vcpus
      
      This is due to multiple issues:
        - in some corner cases cmdlib.py raises an errors.ParameterError but
          this is not handled by cli.py
        - LUQueryInstances declares ‘vcpu’ as a supported field, but doesn't handle
          it, so instead of failing with unknown parameter, e.g.:
            Failure: prerequisites not met for this operation:
            Unknown output fields selected: vcpuscd
          it raises the ParameteError message
      
      This patch:
        - adds handling of 'vcpus' to LUQueryInstances
        - adds handling of the ParameterError exception to cli.py
        - changes the 'else: raise errors.ParameterError' in the field handling of
          LUQueryInstance to an assert, since it's a programmer error if we reached
          this step
      
      With this, a future unhandled parameter will show:
        gnt-instance list -o+vcpus
        Unhandled protocol error while talking to the master daemon:
        Caught exception: Declared but unhandled parameter 'vcpus'
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      c1ce76bb
Loading