Skip to content
Snippets Groups Projects
  1. Jul 24, 2012
  2. Jun 11, 2012
  3. Jun 06, 2012
    • Iustin Pop's avatar
      Fix parallel build failures · a13d6911
      Iustin Pop authored
      
      This is the 2.5 version of the "fix build failures":
      
      - man/%.gen could be left over even in case of failure, due to
        automake bug
      - make man/%.gen runs RUN_IN_TEMPDIR, so let's depend on it, since
        that target has the proper dependencies (create needed dirs)
      - man/%.gen depends on a number of built sources, but the dependency
        was not declared
      
      Furthermore, wraps a long comment.
      
      Tested with -j4/-j16, after `make maintainer-clean'.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      a13d6911
  4. Jun 05, 2012
  5. May 24, 2012
  6. May 11, 2012
  7. May 09, 2012
    • Iustin Pop's avatar
      Add a default PATH variable to OS scripts env · 9a6ade06
      Iustin Pop authored
      
      In commit 896a03f6 I cleaned up the environment for OS scripts,
      however I think that was a bit too extreme - it breaks our own
      instance-debootstrap hooks, because for example dpkg (called from the
      grub script) requires PATH to be set.
      
      Instead of requiring every OS to define a path, let's set a default
      PATH for the OS scripts, which should cover most common uses. A more
      specialised PATH can be set, if needed, in the OS scripts.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      9a6ade06
    • Andrea Spadaccini's avatar
      Move hooks PATH environment variable to constants · aa7b59ac
      Andrea Spadaccini authored
      
      Move the contents of the PATH environment variable for hooks to
      constants, and use its value in the code and in the hooks documentation.
      
      Signed-off-by: default avatarAndrea Spadaccini <spadaccio@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      (cherry picked from commit fe5ca2bb)
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      aa7b59ac
    • Iustin Pop's avatar
      Add note to the install doc about bridge MAC issues · 12f9d75e
      Iustin Pop authored
      
      Thanks to Faidon Liambotis for explaining this on the external IRC
      channel.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarFaidon Liambotis <paravoid@gmail.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      12f9d75e
    • Iustin Pop's avatar
      Fix exception re-raising in Python Luxi clients · 98dfcaff
      Iustin Pop authored
      
      Commit e687ec01 (present in 2.5 since the 2.5 beta 3) did consistency
      fixes across the code-base. Unfortunately this was done without enough
      checks on the actual meaning of one of the fixes, which means error
      re-raising in lib/errors.py is broken.
      
      The problem is that:
      
        raise cls, args
      
      is different than:
      
        raise cls(args)
      
      And our unit-tests didn't catch this (this patch updates the tests).
      
      This breakage is usually trivial, like wrong error messages:
      
        $ gnt-instance remove no-such-instance
        Failure: prerequisites not met for this operation:
        ("Instance 'no-such-instance' not known", 'unknown_entity')
      
      versus:
      
        $ gnt-instance remove no-such-instance
        Failure: prerequisites not met for this operation:
        error type: unknown_entity, error details:
        Instance 'no-such-instance' not known
      
      or:
      
        $ gnt-instance add … no-such-instance
        Failure: prerequisites not met for this operation:
        ('The given name (no-such-instance) does not resolve: Name or service not known', 'resolver_error')
      
      versus:
      
        $ gnt-instance add … no-such-instance
        Failure: prerequisites not met for this operation:
        error type: resolver_error, error details:
        The given name (no-such-instance) does not resolve: Name or service not known
      
      But in some cases where we rely on a certain data representation
      (e.g. HooksAbort), this actually breaks because we try to iterate over
      the wrong type:
      
        File "/usr/lib/python2.6/dist-packages/ganeti/cli.py", line 1907, in FormatError
           for node, script, out in err.args[0]:
        ValueError: need more than 1 value to unpack
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      98dfcaff
  8. May 07, 2012
  9. Apr 11, 2012
  10. Mar 30, 2012
  11. Mar 29, 2012
    • Dimitris Aragiorgis's avatar
      Fix a bug concerning TCP port release · 3b3b1bca
      Dimitris Aragiorgis authored
      
      Commit f396ad8c returns the TCP port used by DRBD disk back to the
      TCP/UDP port pool using AddTcpUdpPort().
      
      However, AddTcpUdpPort() writes the config on every invocation,
      using _WriteConfig(). This causes two problems:
      
       * it causes critical errors logged by VerifyConfig(), after the DRBD
         disk removal, and until the actual instance removal.
       * if the code following AddTcpUdpPort() fails, the port is already
         returned back the pool, which causes the port to have duplicates
         (inconsistent config).
      
      AddTcpUdpPort() is invoked in three cases:
      
       * during InstanceRemove() through _RemoveDisks().
       * during InstanceSetParams() in case of disk removal.
       * during InstanceSetParams() through _ConvertDrbdToPlain().
      
      This commit fixes the problem by removing the _WriteConfig() call from
      AddTcpUdpPort(), delegate it to Update() via the
      TemporaryReservationManager and ensure AddTcpUdpPort() precedes
      Update().
      
      Signed-off-by: default avatarDimitris Aragiorgis <dimara@grnet.gr>
      [iustin@google.com: small comments adjustements]
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      3b3b1bca
  12. Mar 28, 2012
  13. Mar 23, 2012
  14. Mar 22, 2012
  15. Mar 21, 2012
  16. Mar 20, 2012
    • Michael Hanselmann's avatar
      Stop acquiring BGL for LUXI queries · 0fa753ba
      Michael Hanselmann authored
      
      Short description: This fixes an issue whereby masterd would become
      unresponsive on the LUXI socket, leading to client timeouts. While made
      worse in 2.5, the underlying issue was already present in 2.4.
      
      Longer description: Until now all LUXI queries would acquire the BGL
      (big Ganeti lock) in shared mode. With the exception of OpNodeAdd and
      OpNodeRemove, this was also the case for all opcodes before version 2.5.
      In 2.5 we split OpClusterVerify into multiple opcodes, one of which
      (OpClusterVerifyConfig) now acquires the BGL in exclusive mode. Whether
      or not doing so is good is a separate discussion: OpNodeAdd and
      OpNodeRemove, as of this writing, still require an exclusive BGL.
      OpClusterVerifyConfig is run more often than OpNodeAdd or OpNodeRemove
      in normal clusters, which is why we only recognized this issue in 2.5.
      
      What would happen is that once OpClusterVerifyConfig tried to acquire
      its exclusive BGL while it was actually held by other opcodes (e.g.
      OpInstanceReplaceDisks), the locking code would not grant shared
      acquires for the BGL, even when the exclusive acquire is removed from
      the queue for a short amount of time after a timeout. This is necessary
      to prevent lock starvation.
      
      In this situation further LUXI queries requiring the BGL in shared mode,
      e.g. OpClusterQuery, would block and the client eventually time out.
      Over time they fill the client request workerpool's queue and at that
      point even requests not requiring the BGL stop working. Once the
      long-running operation(s) holding the BGL in shared mode finished,
      OpClusterVerifyConfig gets it in exclusive mode and everything returns
      to normal. LUXI recovers very soon too.
      
      I'd like to thank Bernardo Dal Seno for his contribution to this bugfix.
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarBernardo Dal Seno <bdalseno@google.com>
      0fa753ba
Loading