Skip to content
Snippets Groups Projects
  1. May 11, 2012
    • Iustin Pop's avatar
      Workaround changed LVM behaviour · 4c5dd3ff
      Iustin Pop authored
      
      The vgreduce command has changed behaviour from when we initially
      wrote the code (2.02.02 versus 2.02.66, 4 years delta):
      
      - if there are LVs which will be impacted, it requires --force
      - otherwise refuses to proceed, but it still returns exit code 0
      
      We handle this by looking to see if it returns "Wrote out consistent
      volume group" (behaviour unchanged), or if it complains about
      "--force"; in the case it didn't complete, we retry the operation.
      
      We improve a bit the checking of "vgs", as it uses to fail silently
      and we didn't detect it.
      
      New tests for this function should test, I believe, all the expected
      variations; at the least we now have data files with the expected
      output.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      (cherry picked from commit 048eeb2b)
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      4c5dd3ff
  2. May 09, 2012
    • Iustin Pop's avatar
      Fix exception re-raising in Python Luxi clients · 98dfcaff
      Iustin Pop authored
      
      Commit e687ec01 (present in 2.5 since the 2.5 beta 3) did consistency
      fixes across the code-base. Unfortunately this was done without enough
      checks on the actual meaning of one of the fixes, which means error
      re-raising in lib/errors.py is broken.
      
      The problem is that:
      
        raise cls, args
      
      is different than:
      
        raise cls(args)
      
      And our unit-tests didn't catch this (this patch updates the tests).
      
      This breakage is usually trivial, like wrong error messages:
      
        $ gnt-instance remove no-such-instance
        Failure: prerequisites not met for this operation:
        ("Instance 'no-such-instance' not known", 'unknown_entity')
      
      versus:
      
        $ gnt-instance remove no-such-instance
        Failure: prerequisites not met for this operation:
        error type: unknown_entity, error details:
        Instance 'no-such-instance' not known
      
      or:
      
        $ gnt-instance add … no-such-instance
        Failure: prerequisites not met for this operation:
        ('The given name (no-such-instance) does not resolve: Name or service not known', 'resolver_error')
      
      versus:
      
        $ gnt-instance add … no-such-instance
        Failure: prerequisites not met for this operation:
        error type: resolver_error, error details:
        The given name (no-such-instance) does not resolve: Name or service not known
      
      But in some cases where we rely on a certain data representation
      (e.g. HooksAbort), this actually breaks because we try to iterate over
      the wrong type:
      
        File "/usr/lib/python2.6/dist-packages/ganeti/cli.py", line 1907, in FormatError
           for node, script, out in err.args[0]:
        ValueError: need more than 1 value to unpack
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      98dfcaff
  3. Jan 06, 2012
  4. Dec 21, 2011
    • Michael Hanselmann's avatar
      jqueue: Fix deadlock between job queue and dependency manager · 37d76f1e
      Michael Hanselmann authored
      
      When an opcode is about to be processed its dependencies are
      evaluated using “_JobDependencyManager.CheckAndRegister”. Due
      to its nature that function requires a lock on the manager's
      internal structures. All of this happens while the job queue
      lock is held in shared mode (required for the job processor).
      
      When a job has been processed any pending dependencies are re-added
      to the job workerpool. Before this patch that would require
      the manager's lock and then, for adding the jobs, the job queue
      lock. Since this is in reverse order it will lead to deadlocks.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      37d76f1e
  5. Nov 30, 2011
  6. Nov 24, 2011
    • Michael Hanselmann's avatar
      LUGroupAssignNodes: Fix node membership corruption · 54c31fd3
      Michael Hanselmann authored
      
      Note: This bug only manifests itself in Ganeti 2.5, but since the
      problematic code also exists in 2.4, I decided to fix it there.
      
      If a node was assigned to a new group using “gnt-group assign-nodes” the
      node object's group would be changed, but not the duplicate member list
      in the group object. The latter is an optimization to require fewer
      locks for other operations. The per-group member list is only kept in
      memory and not written to disk.
      
      Ganeti 2.5 starts to make use of the data kept in the per-group member
      list and consequently fails when it is out of date. The following
      commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
      confirmed using additional logging):
      
        $ gnt-group add foo
        $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
        $ gnt-cluster verify  # Fails with KeyError
      
      This patch moves the code modifying node and group objects into
      “config.ConfigWriter” to do the complete operation under the config
      lock, and also to avoid making use of side-effects of modifying objects
      without calling “ConfigWriter.Update”. A unittest is included.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      (cherry picked from commit 218f4c3d)
      54c31fd3
    • Michael Hanselmann's avatar
      LUGroupAssignNodes: Fix node membership corruption · 218f4c3d
      Michael Hanselmann authored
      
      Note: This bug only manifests itself in Ganeti 2.5, but since the
      problematic code also exists in 2.4, I decided to fix it there.
      
      If a node was assigned to a new group using “gnt-group assign-nodes” the
      node object's group would be changed, but not the duplicate member list
      in the group object. The latter is an optimization to require fewer
      locks for other operations. The per-group member list is only kept in
      memory and not written to disk.
      
      Ganeti 2.5 starts to make use of the data kept in the per-group member
      list and consequently fails when it is out of date. The following
      commands can be used to reproduce the issue in 2.5 (in 2.4 the issue was
      confirmed using additional logging):
      
        $ gnt-group add foo
        $ gnt-group assign-nodes foo $(gnt-node list --no-header -o name)
        $ gnt-cluster verify  # Fails with KeyError
      
      This patch moves the code modifying node and group objects into
      “config.ConfigWriter” to do the complete operation under the config
      lock, and also to avoid making use of side-effects of modifying objects
      without calling “ConfigWriter.Update”. A unittest is included.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      218f4c3d
  7. Nov 08, 2011
  8. Oct 18, 2011
  9. Oct 17, 2011
  10. Oct 04, 2011
  11. Sep 30, 2011
  12. Aug 26, 2011
  13. Aug 23, 2011
  14. Aug 19, 2011
  15. Aug 12, 2011
  16. Aug 11, 2011
  17. Aug 09, 2011
  18. Aug 08, 2011
    • Michael Hanselmann's avatar
      Detect globbing patterns as query arguments · f8638e28
      Michael Hanselmann authored
      
      Short: this patch enables the use of “gnt-instance list '*.site'”.
      
      Detailed description: This patch changes the command line interface code
      to try to deduce the kind of filter from the arguments to a “list”
      command. If it's a list of plain names an old-style name filter is used.
      If filtering is forced or the single argument is potentially a filter,
      it is parsed as a query filter string. Any name looking like a globbing
      pattern (e.g. “*.site” or “web?.example.com”) is treated as such.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      f8638e28
  19. Aug 05, 2011
  20. Aug 04, 2011
  21. Aug 02, 2011
  22. Jul 26, 2011
  23. Jul 25, 2011
  24. Jul 21, 2011
  25. Jul 20, 2011
    • Michael Hanselmann's avatar
      jqueue: Add “writable” flag to memory objects · c0f6d0d8
      Michael Hanselmann authored
      
      Basically only one instance of the job, the one being processed,
      should be serialized to disk and replicated to other nodes. With
      this flag assertions can be added in various places.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      c0f6d0d8
    • Michael Hanselmann's avatar
      Implement chained jobs · b95479a5
      Michael Hanselmann authored
      
      An overview is available in the design document for this change,
      doc/design-chained-jobs.rst.
      
      When a job enters the job processor, the current opcode's dependencies
      are evaluated. If a referenced job has not yet reached the desired
      status, the current job is registered as a dependant. The job processor
      will continue to work on other pending tasks. When a job finishes it
      notifies any pending dependants by re-adding them to the workerpool.
      
      A per-job processor lock is necessary for rare cases where the same job
      can be re-added twice.
      
      There is no way to view waiting jobs at the moment, but I plan to
      export this information to “gnt-debug locks”.
      
      A so-called dependency manager takes care of managing waiting jobs and
      keeping track of their status.
      
      Unittests are included.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b95479a5
  26. Jul 15, 2011
  27. Jul 12, 2011
Loading