1. 11 May, 2012 3 commits
  2. 09 May, 2012 4 commits
    • Iustin Pop's avatar
      Add a default PATH variable to OS scripts env · 9a6ade06
      Iustin Pop authored
      In commit 896a03f6
      
       I cleaned up the environment for OS scripts,
      however I think that was a bit too extreme - it breaks our own
      instance-debootstrap hooks, because for example dpkg (called from the
      grub script) requires PATH to be set.
      
      Instead of requiring every OS to define a path, let's set a default
      PATH for the OS scripts, which should cover most common uses. A more
      specialised PATH can be set, if needed, in the OS scripts.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      9a6ade06
    • Andrea Spadaccini's avatar
      Move hooks PATH environment variable to constants · aa7b59ac
      Andrea Spadaccini authored
      
      
      Move the contents of the PATH environment variable for hooks to
      constants, and use its value in the code and in the hooks documentation.
      Signed-off-by: default avatarAndrea Spadaccini <spadaccio@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit fe5ca2bb
      
      )
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      aa7b59ac
    • Iustin Pop's avatar
      Add note to the install doc about bridge MAC issues · 12f9d75e
      Iustin Pop authored
      
      
      Thanks to Faidon Liambotis for explaining this on the external IRC
      channel.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarFaidon Liambotis <paravoid@gmail.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      12f9d75e
    • Iustin Pop's avatar
      Fix exception re-raising in Python Luxi clients · 98dfcaff
      Iustin Pop authored
      Commit e687ec01
      
       (present in 2.5 since the 2.5 beta 3) did consistency
      fixes across the code-base. Unfortunately this was done without enough
      checks on the actual meaning of one of the fixes, which means error
      re-raising in lib/errors.py is broken.
      
      The problem is that:
      
        raise cls, args
      
      is different than:
      
        raise cls(args)
      
      And our unit-tests didn't catch this (this patch updates the tests).
      
      This breakage is usually trivial, like wrong error messages:
      
        $ gnt-instance remove no-such-instance
        Failure: prerequisites not met for this operation:
        ("Instance 'no-such-instance' not known", 'unknown_entity')
      
      versus:
      
        $ gnt-instance remove no-such-instance
        Failure: prerequisites not met for this operation:
        error type: unknown_entity, error details:
        Instance 'no-such-instance' not known
      
      or:
      
        $ gnt-instance add … no-such-instance
        Failure: prerequisites not met for this operation:
        ('The given name (no-such-instance) does not resolve: Name or service not known', 'resolver_error')
      
      versus:
      
        $ gnt-instance add … no-such-instance
        Failure: prerequisites not met for this operation:
        error type: resolver_error, error details:
        The given name (no-such-instance) does not resolve: Name or service not known
      
      But in some cases where we rely on a certain data representation
      (e.g. HooksAbort), this actually breaks because we try to iterate over
      the wrong type:
      
        File "/usr/lib/python2.6/dist-packages/ganeti/cli.py", line 1907, in FormatError
           for node, script, out in err.args[0]:
        ValueError: need more than 1 value to unpack
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      98dfcaff
  3. 07 May, 2012 1 commit
  4. 11 Apr, 2012 6 commits
  5. 30 Mar, 2012 1 commit
  6. 29 Mar, 2012 1 commit
    • Dimitris Aragiorgis's avatar
      Fix a bug concerning TCP port release · 3b3b1bca
      Dimitris Aragiorgis authored
      Commit f396ad8c
      
       returns the TCP port used by DRBD disk back to the
      TCP/UDP port pool using AddTcpUdpPort().
      
      However, AddTcpUdpPort() writes the config on every invocation,
      using _WriteConfig(). This causes two problems:
      
       * it causes critical errors logged by VerifyConfig(), after the DRBD
         disk removal, and until the actual instance removal.
       * if the code following AddTcpUdpPort() fails, the port is already
         returned back the pool, which causes the port to have duplicates
         (inconsistent config).
      
      AddTcpUdpPort() is invoked in three cases:
      
       * during InstanceRemove() through _RemoveDisks().
       * during InstanceSetParams() in case of disk removal.
       * during InstanceSetParams() through _ConvertDrbdToPlain().
      
      This commit fixes the problem by removing the _WriteConfig() call from
      AddTcpUdpPort(), delegate it to Update() via the
      TemporaryReservationManager and ensure AddTcpUdpPort() precedes
      Update().
      Signed-off-by: default avatarDimitris Aragiorgis <dimara@grnet.gr>
      [iustin@google.com: small comments adjustements]
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      3b3b1bca
  7. 28 Mar, 2012 1 commit
  8. 23 Mar, 2012 2 commits
  9. 22 Mar, 2012 3 commits
  10. 21 Mar, 2012 1 commit
  11. 20 Mar, 2012 1 commit
    • Michael Hanselmann's avatar
      Stop acquiring BGL for LUXI queries · 0fa753ba
      Michael Hanselmann authored
      
      
      Short description: This fixes an issue whereby masterd would become
      unresponsive on the LUXI socket, leading to client timeouts. While made
      worse in 2.5, the underlying issue was already present in 2.4.
      
      Longer description: Until now all LUXI queries would acquire the BGL
      (big Ganeti lock) in shared mode. With the exception of OpNodeAdd and
      OpNodeRemove, this was also the case for all opcodes before version 2.5.
      In 2.5 we split OpClusterVerify into multiple opcodes, one of which
      (OpClusterVerifyConfig) now acquires the BGL in exclusive mode. Whether
      or not doing so is good is a separate discussion: OpNodeAdd and
      OpNodeRemove, as of this writing, still require an exclusive BGL.
      OpClusterVerifyConfig is run more often than OpNodeAdd or OpNodeRemove
      in normal clusters, which is why we only recognized this issue in 2.5.
      
      What would happen is that once OpClusterVerifyConfig tried to acquire
      its exclusive BGL while it was actually held by other opcodes (e.g.
      OpInstanceReplaceDisks), the locking code would not grant shared
      acquires for the BGL, even when the exclusive acquire is removed from
      the queue for a short amount of time after a timeout. This is necessary
      to prevent lock starvation.
      
      In this situation further LUXI queries requiring the BGL in shared mode,
      e.g. OpClusterQuery, would block and the client eventually time out.
      Over time they fill the client request workerpool's queue and at that
      point even requests not requiring the BGL stop working. Once the
      long-running operation(s) holding the BGL in shared mode finished,
      OpClusterVerifyConfig gets it in exclusive mode and everything returns
      to normal. LUXI recovers very soon too.
      
      I'd like to thank Bernardo Dal Seno for his contribution to this bugfix.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarBernardo Dal Seno <bdalseno@google.com>
      0fa753ba
  12. 19 Mar, 2012 1 commit
  13. 20 Feb, 2012 1 commit
    • Iustin Pop's avatar
      Fix Makefile.am compatibility with automake 1.11.2 · b8fe7ca6
      Iustin Pop authored
      
      
      Automake 1.11.2 made the following change:
      
      * Long-standing bugs:
        - Automake now warns about more primary/directory invalid combinations,
          such as "doc_LIBRARIES" or "pkglib_PROGRAMS".
      
      Unfortunately, this breaks our Makefile.am (issue 216) exactly because
      we were relying on pkglib_SCRIPTS.
      
      This patch works around this by adding a new myexeclibdir variable
      (exec so that it is intalled at `install-exec` time, the same as the
      pkglibdir), and switches to that.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      b8fe7ca6
  14. 31 Jan, 2012 1 commit
  15. 26 Jan, 2012 3 commits
  16. 25 Jan, 2012 1 commit
    • Michael Hanselmann's avatar
      Fix cluster verification issues on multi-group clusters · 2c2f257d
      Michael Hanselmann authored
      
      
      This patch attempts to fix a number of issues with “gnt-cluster verify”
      in presence of multiple node groups and DRBD8 instances split over nodes
      in more than one group.
      
      - Look up instances in a group only by their primary node (otherwise
        split instances would be considered when verifying any of their node's
        groups)
      - When gathering additional nodes for LV checks, just compare instance's
        node's groups with the currently verified group instead of comparing
        against the primary node's group
      - Exclude nodes in other groups when calculating N+1 errors and checking
        logical volumes
      
      Not directly related, but a small error text is also clarified.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      2c2f257d
  17. 20 Jan, 2012 1 commit
  18. 09 Jan, 2012 2 commits
  19. 06 Jan, 2012 1 commit
  20. 21 Dec, 2011 2 commits
  21. 30 Nov, 2011 2 commits
  22. 24 Nov, 2011 1 commit