- Jun 22, 2011
-
-
Apollon Oikonomopoulos authored
A new ip keyword is added ("pool") to signify that LUCreateInstance should get the instance's IP from an IP pool (rather than manually or by DNS resolution). IP and link checks are re-ordered so that a NIC's link is available at the time of IP address validation. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
Apollon Oikonomopoulos authored
This patch introduces the following changes to lib.config and lib.objects to facilitate IP pool management. - Add a "networks" key to objects.cluster to hold link -> subnet relations. Each link is assumed to be associated with at most one IPv4 and at most one IPv6 subnet. Currently only IPv4 subnets are used. - Add a TemporaryReservationManager for IP address reservations. - Add config.{_UnlockedCommitIp,CommitIp,_UnlockedReleaseIp,ReleaseIp} to write IP pool changes to the config file. - Add config.{ReserveIp,GenerateIp} for use with the _temporary_ips TemporaryReservationManager. - Commit succesful reservations/releases to the IP pools in config.AddInstance and config.RemoveInstance Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
Apollon Oikonomopoulos authored
This patch introduces primitive structures for IP pool management. The core of the IP Pool management framework is the ippool.IPv4Network class, which encodes an IPv4 subnet configuration with its IP address pool. The pool functionality depends on python-bitarray and python-ipaddr is used for address manipulation. ippool.IPv4Network has 3 key attributes: - net: The IPv4 subnet definition (CIDR notation) - gateway: The default gateway used, if any - _pool: a bitarray.bitarray storing the reservation status of individual IPs The purpose of including ippool.IPv4Network in ganeti is two-fold: - It will allow including subnet information useful for network configuration (e.g. by being available during an OS provider invocation) - It will be used for automatic IP address allocation during instance creation for both, routed and bridged modes. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
Apollon Oikonomopoulos authored
Commit 5d9bfd87 moved tap interface handling from KVM to Ganeti, partly to also solve the problem of routed interfaces getting configured too early during live migrations, causing network anomalies. In that direction, configuration of NICs of incoming instances was deferred to FinalizeMigration time. However, this causes minor issues with bridged interfaces; KVM sends out an ARP-like packet upon migration finish, which is lost because the tap interface is not yet configured. As a consequence, intermediate network equipment (i.e. switches) does not get notified about the topology change, until the instance transmits another packet after the bridge has been configured, or the switch's ARP cache expires. The proper solution to that is to support different phases in network configuration (pre/post migration), which also requires separate ifup scripts. Until then we fall back to configuring bridged interfaces on incoming instances at migration start, instead of finish. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- Jun 01, 2011
-
-
Apollon Oikonomopoulos authored
Introduced during merge, because of fba7f911. Remove the second definition, as it broke KVM HTTP CDROM boot. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- May 30, 2011
-
-
Apollon Oikonomopoulos authored
ConfigWriter.ReserveLV() and Configwriter.ReserveMAC() called TemporaryReservationManager.Reserve() with the ec_id and resource arguments swapped. As a result, two reservation attempts for the same resource type within the same LU would fail, even if the resources requested were different, e.g.: $ gnt-instance add -t sharedfile -o debootstrap+default \ --net 0:mac=00:01:02:03:04:00 \ --net 1:mac=00:01:02:03:04:ff \ --disk 0:size=2g test_instance Failure: prerequisites not met for this operation: error type: resource_not_unique, error details: MAC address 00:01:02:03:04:ff already in use in cluster This patch fixes the argument order in the call to Reserve(). Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- May 13, 2011
-
-
Apollon Oikonomopoulos authored
Check that the instance is not being migrated to its current primary node during CheckPrereq. Otherwise migration is aborted because the instance is already running and cleaned-up, which causes the running instance to be killed. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- May 12, 2011
-
-
Iustin Pop authored
The opcode parameter ignore_consistency was used in the LU, but not actually declared in the OpCode. The patch adds it in the opcode and the command line client. ObQuote — Please, please, can I have static typing? Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
Two opcodes already use it and we need it for a third, time to add a constant for it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
Iustin Pop authored
This encoding, part of the standard Python installation, is used by the pickle module (in turn used by subprocess when handling failures in program execution). Preloading it means that Python will cache it in memory so that even if the disk goes away or just the module, we're not going to fail in reporting errors. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- May 11, 2011
-
-
Iustin Pop authored
There are multiple bugs with the code checking for N+1 failures in the instance memory changes which needs significant changes, in the meantime we can at least: - change the warning message into an error (--force will skip checks) - only make checks when we increase the memory Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 10, 2011
-
-
Marco Casavecchia authored
Hi all, this patch will add 3 new KVM parameters and a new option. New Parameters: - floppy_image_path = "" -> Specify the floppy image to load as floppy disk. - cdrom2_image_path = "" -> Specify a second cdrom image to load on the system (note: this in not intended to be used as a boot device. To boot the system from cdrom you must use the "cdrom_image_path" parameter as always). - cdrom_disk_type = "" -> it can be one of the kvm supported types as "ide,scsi,paravirtual,ecc". I introduced this optional parameter to make possible to specify a different virtual device for cdroms. It is useful if you want to install a windows system New option for "boot_device" parameter: - "floppy": with this value you should be able to boot a KVM instance from floppy image. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> (cherry picked from commit cc130cc7) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Apollon Oikonomopoulos authored
Comply with changes introduced in f1ea1bef, as we forgot to completely remove self._migrater.target_node from TLMigrateInstance. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
René Nussbaumer authored
This will allow us an easy migration to pv-grub, because a set root_path confused pv-grub. Signed-off-by:
René Nussbaumer <rn@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 09, 2011
-
-
Marco Casavecchia authored
Add INSTANCE_PRIMARY_NODE and INSTANCE_SECONDARY_NODES. These new values are useful for OS scripts that needs to know the nodes where the instance lives.. or has lived. Signed-off-by:
Iustin Pop <iustin@google.com> [iustin@google.com: fixed small issue with SECONDARY_NODES] Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
Currently, when converting an instance from plain to DRBD, the instance is blocked during the entire resync period. This patch adds the --no-wait-for-sync so that the operation finishes as soon as the DRBD sync has started, without waiting for the entire sync. This makes the instance available much faster. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This patch introduces the option of changing an instance's nodes when doing the disk recreation. The rationale is that currently if an instance lives on a node that has gone down and is marked offline, it's not possible to re-create the disks and reinstall the instance on a different node without hacking the config file. Additionally, the LU now locks the instance's nodes (which was not done before), as we most likely allocate new resources on them. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
René Nussbaumer <rn@google.com>
-
- May 06, 2011
-
-
Iustin Pop authored
It makes not sense to show messages like: Fri May 6 02:04:01 2011 - INFO: Resolved given name 'instance18' to 'instance18' So we'll skip the message if the resolved name is identical to the requested one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Michael Hanselmann authored
The original code would get all node information and their groups without before acquiring the necessary locks. With this patch the node information is only retrieved once all locks have been acquired. Groups are locked optimistically and verified after acquiring the node locks. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
- May 05, 2011
-
-
Apollon Oikonomopoulos authored
Commit b9187ba2 erroneously incorporated parts of the code of TLMigrateInstance.CheckPrereq into TLMigrateInstance._RunAllocator. As a result, all migrations performed without the use of an iallocator would end-up running non-live. This patch restores the original behaviour. Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- May 04, 2011
-
-
Apollon Oikonomopoulos authored
Perform a simple urllib2 check on ISO images specified as URL before instance start, so as to work around qemu bug #597575 [1]. [1] https://bugs.launchpad.net/qemu/+bug/597575 Signed-off-by:
Apollon Oikonomopoulos <apollon@noc.grnet.gr>
-
- May 03, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This removes (count of instances + count of nodes) lock acquires/releases. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- May 02, 2011
-
-
Iustin Pop authored
At least one generates an epydoc error :) Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
With the current code, it's possible to mistake a ^C for a protocol error: node1# gnt-job info 221691 [press ^C] Unhandled protocol error while talking to the master daemon: Error while deserializing response: (and note empty error message). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This handles EPIPE errors in two places: ToStream (to catch logging done in GenericMain itself) and in GenericMain (to cover also plain print statements). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Currently cluster verify doesn't check for bridge information; the only checks are done at instance create and failover/migrate time. This means a cluster that seems healthy will fail creation jobs. This patch implements a simple verification that all nodes (in the entire cluster, so doesn't work well for multi-group) have all the required bridges: the default one plus any instance bridge. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 29, 2011
-
-
Michael Hanselmann authored
Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
If an iallocator is used, “gnt-instance replace-disks” would acquire the locks of all nodes (only the allocator will decide which node to use). Unfortunately the unneeded locks were not released during the operation, causing unnecessary delays for other jobs. This patch changes the LU to release unneeded locks and adds assertions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
This is analog to “is_owned” and will be used for assertions. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Michael Hanselmann authored
The iallocator parameter is “-I”, not “-i”. Signed-off-by:
Michael Hanselmann <hansmi@google.com> Reviewed-by:
Iustin Pop <iustin@google.com>
-
Iustin Pop authored
This allows noded to continue instead of blowing up if the libc major number changes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 28, 2011
-
-
Iustin Pop authored
This is a simple change to allow specifying a different VG for the meta device during the creation of instances and addition of disks via gnt-instance modify. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This is a small change to make this function take a list of VG names, instead of a single one. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
Unicode is fun, indeed: >>> len(buffer("abc")) 3 >>> len(buffer(u"abc")) 12 So we can't pass unicode data to buffer(), as the result will be to write the in-memory (usually UTF-32) representation to disk. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
- Apr 27, 2011
-
-
Iustin Pop authored
This patch enhances the multi-VG support in replace disks, by keeping the meta device in the same VG, as opposed to moving it to the data device VG (note that we don't have a way to create the meta in a different VG in the first place, but at least we correctly handle a custom config). Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Doug Dumitru authored
Converting an instance from 'plain' to 'drbd'. The old code would create the drbd volumes in the default VG and then the renames would fail. This fix pulls the plain VG names from the existing volumes and places it into the new disk template. Running 'replace-disks' has a similar issue with the new disks going into the wrong VG and then the rename failing. Their might be a similar issue with 'recreate-disks', but I actually have no idea what recreate-disks does, so did not look into it. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
os.write can do incomplete writes, as long as at least some bytes have been written (like write(2)): >>> os.write(fd, " " * 1300) 1300 >>> os.write(fd, " " * 1300) 1300 >>> os.write(fd, " " * 1300) 1300 >>> os.write(fd, " " * 1300) 980 >>> os.write(fd, " " * 1300) Traceback (most recent call last): File "<stdin>", line 1, in ? OSError: [Errno 28] No space left on device Note that incomplete write that only wrote 980 bytes, before the exception. To workaround this, we simply iterate until all data is written. Unittests could be written by using a parameter instead of hardcoding os.write and checking for incomplete writes. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
A few issues in the clarity of the error messages are fixed: - "ERROR: node node3: OS API version lenny-image": no preposition between the parameter type and the OS name, changed to "for lenny-image" - "API version lenny-image differs from reference node node1: 10, 5 vs. 10, 20, 5, 15": parameters not sorted in display - "OS variants list lenny-image differs from reference node node1: vs. default, i386": empty sets are not clearly delimited, changed to add [] around the sets: "node node1: [] vs. [default, i386]" - "OS parameters lenny-image differs from reference node node1: vs. (u'dhcp', u'Whether to enable (yes) or disable (dhcp)')": ugly formatting in the OS parameters list, as we used to just "%s" the tuple; now it is "reference node node1: [] vs. [dhcp: Whether to enable (yes) or disable (dhcp)]" Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-
Iustin Pop authored
This breaks Ganeti in multiple ways. If we don't make the check in gnt-node itself, then bootstrap.SetupNodeDaemon will restart the master daemon, making the operation fail: node1# gnt-node add --readd node1 Cannot communicate with the master daemon. Is it running and listening for connections? The check in cmdlib is more of a safety check, as we shouldn't reach it. If we do (via a bad client), then it will prevent breakage in the job queue/config handling. Signed-off-by:
Iustin Pop <iustin@google.com> Reviewed-by:
Michael Hanselmann <hansmi@google.com>
-