1. 30 Jul, 2008 2 commits
    • Iustin Pop's avatar
      Rework master startup/shutdown/failover · b1b6ea87
      Iustin Pop authored
      This (big) patch reworks the master startup/shutdown and the fixes the
      master failover.
      What does the patch do?
      For master start/stop:
        - remove the old ganeti-master script and its associated man page
        - moves the ip start/stop directly into the backend.(Start|Stop)Master
        - adds start/stop of the master/rapi daemon into these functions,
          selectively based on the start/stop arguments
        - makes the master call via rpc StartMaster(start_daemons=False) to
          the local node so that the master IP is started
        - and finally changes the example init.d script to directly start and
          stop all three daemons, since they do the right thing (depending on
          master/not master role)
      For master failover:
        - moves the code from LUMasterFailover into bootstrap.MasterFailover,
          since we need to start/stop the master during this operation and
          thus it can't be executed from the master
        - removes the LUMasterFailover and its associated opcode
      Notes: ubuntu's /etc/lsb-base-logging.sh is dumb, so the messages 'not
      master' are not seen during startup on non-master nodes.
      Reviewed-by: ultrotter
    • Iustin Pop's avatar
      Add a new parameter to backend.(Start|Stop)Master · 1c65840b
      Iustin Pop authored
      This patch adds a new, unused for now, parameter to the start and stop
      master operations in backend. The idea behind it is that we need to be
      able to control whether the IP (de)activation is coupled with daemon
      The callers are also modified to pass this parameter (even if unused for
      Reviewed-by: ultrotter
  2. 25 Jul, 2008 1 commit
  3. 22 Jul, 2008 6 commits
    • Guido Trotter's avatar
      Convert SetInstanceParams to concurrency · 1a5c7281
      Guido Trotter authored
      Grab a lock for the instance we're working on, and update its params.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Use Update in SetInstanceParams · ea94e1cd
      Guido Trotter authored
      When we set the instance params we're not adding a new instance, but
      just updating an existing one, so why using AddInstance?
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Convert LUConnectConsole to concurrency · 8659b73e
      Guido Trotter authored
      For ConnectConsole we just need to lock the instance we're connecting
      to. We make a few rpcs to its primary node, but node daemons can now
      handle multiple queries and nodes cannot be removed till they have
      instances on them anyway. Note that since we return the ssh command, and
      that's executed outside of the ganeti daemon, without any locks held,
      the instance can then be subject to operations while we're connected to
      it, but that was the previous behavior as well.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Add _ExpandAndLockInstance auxiliary function. · 43905206
      Guido Trotter authored
      LUs that take an instance name as input and need to expand its name and
      lock it can use it to simplify their ExpandNames call. Possibly, and
      _ExpandAndLockNode will come as well.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Convert two (simple) LUs to be concurrent · 642339cf
      Guido Trotter authored
      LUQueryClusterInfo and LUDumpClusterConfig can be made concurrent and
      don't need to acquire any locks. In fact they don't interact with the
      cluster at all, but just with its configuration, which is thread-safe by
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Add missing empty line · 0eed6e61
      Guido Trotter authored
      Two top level definitions were separated only by one empty line.
      Fixing this.
      Reviewed-by: imsnah
  4. 09 Jul, 2008 1 commit
  5. 08 Jul, 2008 7 commits
    • Guido Trotter's avatar
      Convert LUTestDelay to concurrent usage · fbe9022f
      Guido Trotter authored
      In order to do so:
        - We set REQ_BGL to False
        - We implement ExpandNames
      That's it, really.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      LogicalUnit: add ExpandNames function · d465bdc8
      Guido Trotter authored
      New concurrent LUs will need to call ExpandNames so that any names
      passed in by the user are canonicalized, and can be used by hooks,
      locking and other parts of the code. This was done in CheckPrereq
      before, but it's now splitted out, as it's needed for locking, which in
      turn CheckPrereq needs. Old LUs can be converted gradually.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Add a missing import to cmdlib · 6048c986
      Guido Trotter authored
      cmdlib uses some constants from locking (ie. locking levels) but doesn't
      import it. This patch fixes the issue.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Fix an error accessing the cfg · f64c9de6
      Guido Trotter authored
      Since the context is passed to LogicalUnit, rather than the cfg, we can
      only access the cfg as self.cfg, self.context.cfg, or context.cfg (in
      the constructor). cfg is not valid anymore.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Add and remove instance/node locks · a2fd9afc
      Guido Trotter authored
      Whenever we add an instance or node to the cluster (i.e. to the config
      and whenever we remove them we should add/remove locks as well). In the
      future we may want to optimize this so that the configwriter does it, or
      it's handled at the context level, but till we're adding/removing
      instances and nodes with the BGL held it doesn't matter too much.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Pass context to LUs · 77b657a3
      Guido Trotter authored
      Rather than passing a ConfigWriter to the LUs we'll pass the whole
      context, from which a ConfigWriter can be extracted, but we can also
      access the GanetiLockManager. This also fixes the places where a FakeLU
      is created.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Fix a typo in LUTestDelay docstring · 0b097284
      Guido Trotter authored
      Reviewed-by: iustinp
  6. 01 Jul, 2008 1 commit
    • Guido Trotter's avatar
      Add REQ_BGL LogicalUnit run requirement · 7e55040e
      Guido Trotter authored
      When logical units have REQ_BGL set (it is currently the default) they
      need to be the only ganeti operation run on the cluster, and we'll
      guarantee it at the master daemon level. Currently only one thread is
      running at a time, so this requirement is never broken.
      Reviewed-by: iustinp
  7. 27 Jun, 2008 5 commits
    • Guido Trotter's avatar
      AddNode: move the initial setup to boostrap · 827f753e
      Guido Trotter authored
      From the master node we can't start ssh and connect to the remote node,
      nor we can do it from ganeti-noded as this ssh section will possibly ask
      for key confirmation and password. So the code to copy the ganeti-noded
      password and SSL key has been moved to bootstrap.py, and it's called by
      gnt-node before the AddNode opcode.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      LUAddNode: use node-verify to check node hostname · 5c0527ed
      Guido Trotter authored
      As we can't use ssh.VerifyNodeHostname directly, we'll set up a mini
      node-verify to do checking between the master and the new node. In the
      future networking checks, or more nodes, can be added as well.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      LUAddNode: use self.sstore, not a local ss · 3d1e7706
      Guido Trotter authored
      Since we're inside a LU we have access to self.sstore.
      No need to use ss, which separate instantiation will disappear in a few
      patches! ;)
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      LUAddNode: upload files via rpc, not scp · b5602d15
      Guido Trotter authored
      We used to scp all the ssconf files, and the vnc password file to the
      new node. With this patch we use the upload_file rpc, specifying just
      the new node as a destination. All the files previously copied by scp
      are already allowed by the backend.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Change fping to TcpPing in two LUs · 937f983d
      Guido Trotter authored
      Two LUs are using RunCmd to call fping, in order to check for an IP
      presence on the network. Substituting it with TcpPing will get rid of
      it, which makes it not break in the new world order, where the master
      cannot fork.
      Reviewed-by: iustinp
  8. 26 Jun, 2008 3 commits
    • Guido Trotter's avatar
      When removing a node don't ssh to it · d489ca4f
      Guido Trotter authored
      Even in 1.2 this behaviour is broken, as the rpc call will remove the
      ssh keys before we get a chance to log in. Now the rpc takes care of
      shutting down the node daemon as well, so we definitely can avoid this.
      This makes the LURemoveNode operation work again with the threaded
      master daemon.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Remove spurious check during LUAddNode · 49abbd3e
      Guido Trotter authored
      There is no point in checking whether the cluster VNC password file
      exists as a prerequisite for AddNode, considering the check happens on
      the master node, not the target one. Removing this check.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      Improve LURemoveNode BuildHooksEnv docstring · d08869ee
      Guido Trotter authored
      Reviewed-by: iustinp
  9. 25 Jun, 2008 1 commit
    • Michael Hanselmann's avatar
      Cleanup old DRBD 0.7.x code · 00fb8246
      Michael Hanselmann authored
      Apparently there were still some leftovers. While removing an instance,
      I got the message "unhandled exception 'module' object has no attribute
      Reviewed-by: iustinp
  10. 23 Jun, 2008 1 commit
    • Iustin Pop's avatar
      Fix gnt-cluster “command” and “copyfile” · b3989551
      Iustin Pop authored
      Since the disabling of forking in the master daemon, the two ssh-based
      subcommands were not working anymore. However, there is no need at all
      for the commands to be run from the master daemon (permissions to read
      the cluster private ssh key notwithstanding), they can be run directly
      from the command line utilities.
      The patch removes the two opcodes OpRunClusterCommand and
      OpClusterCopyFile (and their associated LUs) and changes the code in
      ‘gnt-cluster’ to query the list of nodes and run directly the SshRunner
      over the list. As such, all forking is done from the gnt-cluster script,
      and the commands are working again.
      Reviewed-by: imsnah
  11. 22 Jun, 2008 1 commit
    • Iustin Pop's avatar
      Add a ‘tags’ field to instance and node listing · 130a6a6f
      Iustin Pop authored
      Currently there isn't any easy way to list all nodes or instance and
      their tags; you have to query each node in turn, or list all the tags
      via something like “gnt-cluster search-tags '.*'”. Of course, this is
      not optimal.
      The patch adds a new fields to “gnt-instance list” and “gnt-node list”
      called ‘tags’, that will list the tags of the object in comma-separated
      form. This field will be empty if there are no tags (when using a
      separator this output can still be parsed by other scripts).
      At opcode level, there is a new fields called ‘tags’ that returns a
      (python) list of the object tags.
      Reviewed-by: ultrotter
  12. 17 Jun, 2008 2 commits
    • Iustin Pop's avatar
      Fix an error-handling case · c7cdfc90
      Iustin Pop authored
      There is a mistake in handling grow-disk for an invalid disk. This patch
      fixes it.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Implement disk grow at LU level · 8729e0d7
      Iustin Pop authored
      This patch adds a new opcode and LU for growing an instance's disk.
      The opcode allows growing only one disk at time, and will throw an error
      if the operation fails midway (e.g. on the primary node after it has
      been increased on the secondary node). As such, it might actually leave
      different sized LVs on different nodes, but this will not create
      Reviewed-by: imsnah
  13. 16 Jun, 2008 1 commit
    • Guido Trotter's avatar
      Move SetKey to WritableSimpleStore and use it · 05f86716
      Guido Trotter authored
      Before we used to be able to update SimpleStore by just calling SetKey, this
      feature is now moved to an external class, which inherits from it. In this
      patch the new WritableSimpleStore class is also put to use, in the LUs that
      need it. Rather than making each LU instantiate it, we have a new LogicalUnit
      flag REQ_WSSTORE which defaults to False, but when declared to be True asks the
      LogicalUnit to be initialized with a writeable version of the SimpleStore.
      LUMasterFailover and LURenameCluster are then changed to use it.
      InitCluster is also changed to instantiate a WritableSimpleStore, rather
      than a normal one.
      Reviewed-by: imsnah
  14. 15 Jun, 2008 3 commits
    • Guido Trotter's avatar
      Activate down instances' disks on replace-disks · 22985314
      Guido Trotter authored
      When replacing disks or evacuating nodes with instances administratively
      down ganeti fails because the instance disks are not active. This patch
      activates them, performs the replacement, and shuts them down again.
      Changing this also fixes the same issue on gnt-node evacuate.
      Reviewed-by: iustinp
    • Guido Trotter's avatar
      FailoverInstance: change AddInstance with Update · b6102dab
      Guido Trotter authored
      We're not adding a new instance, just making configuration changes to
      the one we're working on.
      Reviewed-by: imsnah
    • Iustin Pop's avatar
      Fix an error message in instance add · 3e91897b
      Iustin Pop authored
      There is a mistake in the error message generated when we can't reach a
      node for checking for available disk space. Without it, the error
      message is:
      Failure: prerequisites not met for this operation:
      Cannot get current information from node '{u'gnte2.lab.k1024.org':
      {'cpu_total': 1, 'memory_free': 480, 'vg_size': 131068, 'memory_total':
      504, 'bootid': '2176dd3b-2f96-42f0-8b6e-2873ecaf5f9c', 'memory_dom0':
      134, 'vg_free': 130172}, u'gnte1.lab.k1024.org': False}'
      instead of the expected:
      Failure: prerequisites not met for this operation:
      Cannot get current information from node 'gnte2.lab.k1024.org'
      Reviewed-by: imsnah
  15. 12 Jun, 2008 4 commits
  16. 31 May, 2008 1 commit
    • Iustin Pop's avatar
      Add check for node memory in instance creation · 49ce1563
      Iustin Pop authored
      Currently the check for enough memory is done only on instance start
      command and failover command. But we also start an instance in instance
      create, therefore we need to check this instead of failing to start in
      the hypervisor phase.
      The patch adds a check for node memory in the case the creation command
      specifies that the instance should be started. It is allowed for the
      memory to be less than needed if the instance will not be started, in
      order to allow migration and other such cases.
      Reviewed-by: imsnah