1. 24 Oct, 2011 3 commits
  2. 13 Oct, 2011 2 commits
  3. 12 Oct, 2011 1 commit
    • Michael Hanselmann's avatar
      rpc: Disable HTTP client pool and reduce memory consumption · 05927995
      Michael Hanselmann authored
      
      
      We noticed that “ganeti-masterd” can use large amounts of memory,
      especially on large clusters. Measurements showed a single PycURL client
      using about 500 kB of heap memory (the actual usage depends on versions,
      build options and settings).
      
      The RPC client uses a per-thread HTTP client pool with one client per
      node. At this time there are 41 non-main threads (25 for the job queue
      and 16 for client requests). This means the HTTP client pools use a lot
      of memory (ca. 200 MB for 10 nodes, ca. 1 GB for 50 nodes).
      
      This patch disables the per-thread HTTP client pool. No cleanup of
      unused code is done. That will be done in the master branch only.
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      05927995
  4. 05 Oct, 2011 1 commit
  5. 30 Sep, 2011 2 commits
  6. 29 Sep, 2011 1 commit
    • Andrea Spadaccini's avatar
      Make migration RPC non-blocking · 6a1434d7
      Andrea Spadaccini authored
      
      
      To add status reporting for the KVM migration, the instance_migrate RPC
      must be non-blocking. Moreover, there must be a way to represent the
      migration status and a way to fetch it.
      
      * constants.py:
        - add constants representing the migration statuses
      
      * objects.py:
        - add the MigrationStatus object
      
      * hypervisor/hv_base.py
        - change the FinalizeMigration method name to FinalizeMigrationDst
        - add the FinalizeMigrationSource method
        - add the GetMigrationStatus method
      
      * hypervisor/hv_kvm.py
        - change the implementation of MigrateInstance to be non-blocking
          (i.e. do not poll the status of the migration)
        - implement the new methods defined in BaseHypervisor
      
      * backend.py, server/noded.py, rpc.py
        - add methods to call the new hypervisor methods
        - fix documentation of the existing methods to reflect the changes
      
      * cmdlib.py
        - adapt the logic of TLMigrateInstance._ExecMigration to reflect
          the changes
      Signed-off-by: default avatarAndrea Spadaccini <spadaccio@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      6a1434d7
  7. 28 Sep, 2011 2 commits
  8. 27 Sep, 2011 3 commits
  9. 30 Aug, 2011 1 commit
  10. 03 Aug, 2011 1 commit
  11. 08 Jul, 2011 1 commit
  12. 24 May, 2011 1 commit
  13. 10 May, 2011 1 commit
  14. 08 Mar, 2011 1 commit
  15. 04 Mar, 2011 1 commit
  16. 17 Feb, 2011 1 commit
  17. 28 Jan, 2011 1 commit
    • Iustin Pop's avatar
      Re-create instance disk symlinks on activate · c417e115
      Iustin Pop authored
      
      
      This patch implements recreation of instance disk symlinks when the
      activate-disks operation is run. Until now, it was not possible to
      re-create these symlinks without stopping and starting or migrating an
      instance as the RPC call where this is done was in instance startup
      and migration.
      
      In order to do this, the blockdev_assemble rpc call needs the disk
      index too, which is added to the protocol. This is a change from 2.3
      and makes instance startup incompatible (FYI).
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      c417e115
  18. 11 Jan, 2011 1 commit
  19. 06 Jan, 2011 1 commit
    • Iustin Pop's avatar
      RPC: mark jobqueue functions as URGENT · d2cd6944
      Iustin Pop authored
      
      
      Recently, we've seen more and more cases of a specific breakage
      pattern in Ganeti: master candidates which are semi-alive (as in, they
      respond to ping, they can complete a TCP/SSL handshake, but otherwise
      the root filesystem is broken) cause lots of confusion within masterd.
      
      My analysis shows that waiting up to 5 minutes for a reply from such a
      broken master candidate is too long, and this long wait breaks other
      timeouts (e.g. the Luxi timeout), making standard recovery from this
      situation very hard. It's much easier to kill the master daemon, edit
      manually the config file and mark the node as regular, then restart
      the master daemon.
      
      The proposal is therefore to reduce the timeout for the job queue
      functions to TMO_URGENT (1 minute), which should be more balanced
      between a working but overloaded node and a broken node.
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      d2cd6944
  20. 17 Dec, 2010 1 commit
  21. 09 Dec, 2010 1 commit
  22. 01 Dec, 2010 1 commit
  23. 03 Nov, 2010 1 commit
  24. 28 Oct, 2010 1 commit
  25. 26 Oct, 2010 2 commits
  26. 21 Oct, 2010 1 commit
  27. 11 Oct, 2010 1 commit
    • Iustin Pop's avatar
      RPC: disable curl's Expect header · 8e29563f
      Iustin Pop authored
      
      
      This patch solves the very slow (~8-9 seconds) gnt-instance modify
      behaviour. Well, it solves in general the slow RPC behaviour, but it was
      most visible in that LU.
      
      It seems that curl's behaviour with regard to file uploads (via PUT) and
      the 'Expect' header are interacting badly with our http server.
      
      First, our http server doesn't properly handle this header. According to
      RFC 2616:
      
        Requirements for HTTP/1.1 origin servers: Upon receiving a request
        which includes an Expect request-header field with the "100-continue"
        expectation, an origin server MUST either respond with 100 (Continue)
        status and continue to read from the input stream, or respond with a
        final status code.
      
      Our server doesn't do this, and hence it triggers this behaviour in curl
      (from the curl FAQ):
      
        4.16 My HTTP POST or PUT requests are slow!
      
        libcurl makes all POST and PUT requests (except for POST requests with a
        very tiny request body) use the "Expect: 100-continue" header. This header
        allows the server to deny the operation early so that libcurl can bail out
        already before having to send any data. This is useful in authentication
        cases and others.
      
        However, many servers don't implement the Expect: stuff properly and if the
        server doesn't respond (positively) within 1 second libcurl will continue
        and send off the data anyway.
      
        You can disable libcurl's use of the Expect: header the same way you disable
        any header, using -H / CURLOPT_HTTPHEADER, or by forcing it to use HTTP 1.0.
      
      This behaviour was detected by watching the captured traffic (in non-SSL
      mode), where between the initial HTTP headers (ending with the Expect
      one), there was a ~1-2 second pause until curl was sending the body.
      Properly RTFM-ing would have saved ~1 day of digging around, but hey…
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarMichael Hanselmann <hansmi@google.com>
      8e29563f
  28. 23 Aug, 2010 2 commits
  29. 19 Aug, 2010 2 commits
  30. 18 Aug, 2010 1 commit