Skip to content
Snippets Groups Projects
  1. Jan 06, 2011
  2. Dec 16, 2010
    • Michael Hanselmann's avatar
      ensure-dirs: Speed up when using big queues · 196d70fa
      Michael Hanselmann authored
      
      The “ensure-dirs” script as included in Ganeti 2.3 is very slow when
      working with big queues requiring a change of permissions on many or all
      files.
      
      $ find /var/lib/ganeti/queue/ | wc -l
      52354
      
      Before this change:
      $ time /usr/local/lib/ganeti/ensure-dirs -f
      real    16m4.739s
      
      While not adressed in this patch, I'd like to record the overall
      ineffiency of the “ensure-dirs” script, even after this change:
      
      $ time /usr/local/lib/ganeti/ensure-dirs -f
      real    5m57.362s
      […]
      $ strace -e clone,execve -f -c /usr/local/lib/ganeti/ensure-dirs -f
      % time     seconds  usecs/call     calls    errors syscall
      ------ ----------- ----------- --------- --------- ----------------
       50.08    5.147090          49    104774           clone
       49.92    5.131094          49    104739           execve
      
      More changes will be needed. Just for comparision, a small Python
      snippet changing permissions on all files (“ensure-dirs” changes the
      owner too):
      
      $ time python -c 'import os; from ganeti import utils;
      [os.chmod(i, 0644) for i in
      utils.ListVisibleFiles("/var/lib/ganeti/queue/archive/big")]'
      real    0m0.605s
      […]
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      196d70fa
  3. Nov 29, 2010
  4. Nov 19, 2010
  5. Nov 16, 2010
  6. Oct 29, 2010
  7. Oct 28, 2010
  8. Oct 26, 2010
  9. Oct 14, 2010
  10. Oct 13, 2010
  11. Oct 07, 2010
  12. Sep 24, 2010
  13. Sep 13, 2010
  14. Sep 10, 2010
  15. Sep 07, 2010
  16. Sep 06, 2010
  17. Sep 02, 2010
  18. Aug 24, 2010
    • Michael Hanselmann's avatar
      Add simple lock monitor · 19b9ba9a
      Michael Hanselmann authored
      
      This patch adds an initial implementation of a lock monitor, accessible
      for the user through “gnt-debug locks”. It currently shows all resource
      locks: BGL, nodes and instances. Config and job queue locks could be
      shown too, but wouldn't be of much help.  The current owner(s) and mode
      are also shown.
      
      Showing pending acquires will require further changes on the SharedLock
      internals and is not yet implemented.
      
      Example output:
      $ gnt-debug locks -o name,mode,owner
      Name            Mode      Owner
      BGL/BGL         shared    JobQueue19/Job147
      instances/inst1 exclusive JobQueue19/Job147
      instances/inst2 -         -
      instances/inst3 -         -
      instances/inst4 -         -
      nodes/node1     exclusive JobQueue19/Job147
      nodes/node2     exclusive JobQueue19/Job147
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarGuido Trotter <ultrotter@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      19b9ba9a
  19. Aug 23, 2010
  20. Aug 19, 2010
  21. Aug 18, 2010
    • Manuel Franceschini's avatar
      Support for resolving hostnames to IPv6 addresses · b705c7a6
      Manuel Franceschini authored
      
      This patch enables IPv6 name resolution by using socket.getaddrinfo
      instead of socket.gethostbyname_ex.
      
      It renames the HostInfo class to Hostname and unifies its use throughout
      the code. This is achieved by using static calls where no object is
      needed and removes some obsolete code.
      
      For now, we just resolve to IPv4 addresses, but this will change once it
      is needed.
      
      Signed-off-by: default avatarManuel Franceschini <livewire@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b705c7a6
    • Manuel Franceschini's avatar
      Introduce new IPAddress classes · 8b312c1d
      Manuel Franceschini authored
      
      This patch unifies the netutils functions dealing with IP addresses to
      three classes:
      - IPAddress: Common IP address functionality
      - IPv4Address: IPv4 specific functionality
      - IPv6address: IPv6-specific functionality
      
      Furthermore it adds methods to check whether an address is a loopback
      address, replacing the .startswith("127") for IPv4 and adding IPv6
      support.
      
      It also provides the basis for future IPv6 address handling. Methods to
      convert IP strings to their corresponding interger values will allow to
      canonicalize IPv6 addresses.
      
      Signed-off-by: default avatarManuel Franceschini <livewire@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      8b312c1d
  22. Jul 29, 2010
    • Michael Hanselmann's avatar
      workerpool: Change signature of AddTask function to not use *args · b2e8a4d9
      Michael Hanselmann authored
      
      By changing it to a normal parameter, which must be a sequence, we can
      start using keyword parameters.
      
      Before this patch all arguments to “AddTask(self, *args)” were passed as
      arguments to the worker's “RunTask” method. Priorities, which should be
      optional and will be implemented in a future patch, must be passed as a keyword
      parameter. This means “*args” can no longer be used as one can't combine *args
      and keyword parameters in a clean way:
      
      >>> def f(name=None, *args):
      ...   print "%r, %r" % (args, name)
      ...
      >>> f("p1", "p2", "p3", name="thename")
      Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
       TypeError: f() got multiple values for keyword argument 'name'
      
      Signed-off-by: default avatarMichael Hanselmann <hansmi@google.com>
      Reviewed-by: default avatarIustin Pop <iustin@google.com>
      b2e8a4d9
  23. Jul 26, 2010
    • Iustin Pop's avatar
      masterd: move the IP activation from Exec to Check · 340f4757
      Iustin Pop authored
      
      Currently, the master IP activation is done in the Exec function. Since
      the original masterd process returns after forking, and Exec is run in
      the (grand)child process, this means that after 'ganeti-masterd' has
      returned there are still initialization tasks running.
      
      Normally this is not a problem, but in cases where one does quick master
      failovers, this creates a race condition which hits the QA scripts
      especially hard.
      
      To solve this, and make the startup process cleaner (the system is in
      steady state after the command has returned, even though masterd startup
      could still fail), we move the IP activation to Check(). This also
      allows error messages about the IP activation to be seen on the console.
      
      With this patch enabled, I can no longer reproduce the double-failover
      errors, which were occuring before in 4/5 cases.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      340f4757
    • Iustin Pop's avatar
      Move the UsesRPC decorator from cli to rpc · e0e916fe
      Iustin Pop authored
      
      This is needed because not just the cli scripts need this decorator, but
      the master daemon too (and it already duplicated the code once).
      
      In cli.py we just leave a stub, so that we don't have to modify all the
      scripts to import rpc.py.
      
      We then change the master daemon code to reuse this decorator, instead
      of duplicating it.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      e0e916fe
    • Iustin Pop's avatar
      watcher: smarter handling of instance records · f5116c87
      Iustin Pop authored
      
      This patch implements a few changes to the instance handling. First, old
      instances which no longer exist on the cluster are removed from the
      state file, to keep things clean.
      
      Second, the instance restart counters are reset every 8 hours, since
      some error cases might be transient (e.g. networking issues, or machine
      temporarily down), and if the problem takes more than 5 restarts but is
      not permanent, watcher will not restart the instance. The value of 8
      hours is, I think, both conservative (as not to hammer the cluster too
      often with restarts) and fast enough to clear semi-transient problems.
      
      And last, if an instance is not restarted due to exhausted retries, this
      should be warned, otherwise it's hard to understand why watcher doesn't
      want to restart an ERROR_down instance.
      
      Signed-off-by: default avatarIustin Pop <iustin@google.com>
      Reviewed-by: default avatarRené Nussbaumer <rn@google.com>
      f5116c87
Loading