diff --git a/Makefile.am b/Makefile.am index 08b55ce00d8c6a2e75353f96e4670e30b40e629c..f8f836b244dbe5d597cc57f55e3bae48a9cc6dac 100644 --- a/Makefile.am +++ b/Makefile.am @@ -285,6 +285,7 @@ docrst = \ doc/design-impexp2.rst \ doc/design-lu-generated-jobs.rst \ doc/design-multi-reloc.rst \ + doc/design-network.rst \ doc/cluster-merge.rst \ doc/design-shared-storage.rst \ doc/devnotes.rst \ diff --git a/doc/design-network.rst b/doc/design-network.rst new file mode 100644 index 0000000000000000000000000000000000000000..a7318d4ce9e516e36654325f91512b17a1e503f1 --- /dev/null +++ b/doc/design-network.rst @@ -0,0 +1,313 @@ +================== +Network management +================== + +.. contents:: :depth: 4 + +This is a design document detailing the implementation of network resource +management in Ganeti. + +Current state and shortcomings +============================== + +Currently Ganeti supports two configuration modes for instance NICs: +routed and bridged mode. The ``ip`` NIC parameter, which is mandatory +for routed NICs and optional for bridged ones, holds the given NIC's IP +address and may be filled either manually, or via a DNS lookup for the +instance's hostname. + +This approach presents some shortcomings: + +a) It relies on external systems to perform network resource + management. Although large organizations may already have IP pool + management software in place, this is not usually the case with + stand-alone deployments. For smaller installations it makes sense to + allocate a pool of IP addresses to Ganeti and let it transparently + assign these IPs to instances as appropriate. + +b) The NIC network information is incomplete, lacking netmask and + gateway. Operating system providers could for example use the + complete network information to fully configure an instance's + network parameters upon its creation. + + Furthermore, having full network configuration information would + enable Ganeti nodes to become more self-contained and be able to + infer system configuration (e.g. /etc/network/interfaces content) + from Ganeti configuration. This should make configuration of + newly-added nodes a lot easier and less dependant on external + tools/procedures. + +c) Instance placement must explicitly take network availability in + different node groups into account; the same ``link`` is implicitly + expected to connect to the same network across the whole cluster, + which may not always be the case with large clusters with multiple + node groups. + + +Proposed changes +---------------- + +In order to deal with the above shortcomings, we propose to extend +Ganeti with high-level network management logic, which consists of a new +NIC mode called ``managed``, a new "Network" configuration object and +logic to perform IP address pool management, i.e. maintain a set of +available and occupied IP addresses. + +Configuration changes ++++++++++++++++++++++ + +We propose the introduction of a new high-level Network object, +containing (at least) the following data: + +- Symbolic name +- UUID +- Network in CIDR notation (IPv4 + IPv6) +- Default gateway, if one exists (IPv4 + IPv6) +- IP pool management data (reservations) +- Default NIC connectivity mode (bridged, routed). This is the + functional equivalent of the current NIC ``mode``. +- Default host interface (e.g. br0). This is the functional equivalent + of the current NIC ``link``. +- Tags + +Each network will be connected to any number of node groups, possibly +overriding connectivity mode and host interface for each node group. +This is achieved by adding a ``networks`` slot to the NodeGroup object +and using the networks' UUIDs as keys. + +IP pool management +++++++++++++++++++ + +A new helper library is introduced, wrapping around Network objects to +give IP pool management capabilities. A network's pool is defined by two +bitfields, the length of the network size each: + +``reservations`` + This field holds all IP addresses reserved by Ganeti instances, as + well as cluster IP addresses (node addresses + cluster master) + +``external reservations`` + This field holds all IP addresses that are manually reserved by the + administrator, because some other equipment is using them outside the + scope of Ganeti. + +The bitfields are implemented using the python-bitarray package for +space efficiency and their binary value stored base64-encoded for JSON +compatibility. This approach gives relatively compact representations +even for large IPv4 networks (e.g. /20). + +Ganeti-owned IP addresses (node + master IPs) are reserved automatically +if the cluster's data network itself is placed under pool management. + +Helper ConfigWriter methods provide free IP address generation and +reservation, using a TemporaryReservationManager. + +It should be noted that IP pool management is performed only for IPv4 +networks, as they are expected to be densely populated. IPv6 networks +can use different approaches, e.g. sequential address asignment or +EUI-64 addresses. + +Managed NIC mode +++++++++++++++++ + +In order to be able to use the new network facility while maintaining +compatibility with the current networking model, a new network mode is +introduced, called ``managed`` to reflect the fact that the given NICs +network configuration is managed by Ganeti itself. A managed mode NIC +accepts the network it is connected to in its ``link`` argument. +Userspace tools can refer to networks using their symbolic names, +however internally, the link argument stores the network's UUID. + +We also introduce a new ``ip`` address value, ``constants.NIC_IP_POOL``, +that specifies that a given NIC's IP address should be obtained using +the IP address pool of the specified network. This value is only valid +for managed-mode NICs, where it is also used as a default instead of +``constants.VALUE_AUTO``. A managed-mode NIC's IP address can also be +specified manually, as long as it is compatible with the network the NIC +is connected to. + + +Hooks ++++++ + +``OP_NETWORK_ADD`` + Add a network to Ganeti + + :directory: network-add + :pre-execution: master node + :post-execution: master node + +``OP_NETWORK_CONNECT`` + Connect a network to a node group. This hook can be used to e.g. + configure network interfaces on the group's nodes. + + :directory: network-connect + :pre-execution: master node, all nodes in the connected group + :post-execution: master node, all nodes in the connected group + +``OP_NETWORK_DISCONNECT`` + Disconnect a network to a node group. This hook can be used to e.g. + deconfigure network interfaces on the group's nodes. + + :directory: network-disconnect + :pre-execution: master node, all nodes in the connected group + :post-execution: master node, all nodes in the connected group + +``OP_NETWORK_REMOVE`` + Remove a network from Ganeti + + :directory: network-add + :pre-execution: master node, all nodes + :post-execution: master node, all nodes + +Hook variables +^^^^^^^^^^^^^^ + +``INSTANCE_NICn_MANAGED`` + Non-zero if NIC n is a managed-mode NIC + +``INSTANCE_NICn_NETWORK`` + The friendly name of the network + +``INSTANCE_NICn_NETWORK_UUID`` + The network's UUID + +``INSTANCE_NICn_NETWORK_TAGS`` + The network's tags + +``INSTANCE_NICn_NETWORK_IPV4_CIDR``, ``INSTANCE_NICn_NETWORK_IPV6_CIDR`` + The subnet in CIDR notation + +``INSTANCE_NICn_NETWORK_IPV4_GATEWAY``, ``INSTANCE_NICn_NETWORK_IPV6_GATEWAY`` + The subnet's default gateway + + +Backend changes ++++++++++++++++ + +In order to keep the hypervisor-visible changes to a minimum, and +maintain compatibility with the existing network configuration scripts, +the instance's hypervisor configuration will have host-level link and +mode replaced by the *connectivity mode* and *host interface* of the +given network on the current node group. + +The managed mode can be detected by the presence of new environment +variables in network configuration scripts: + +Network configuration script variables +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``MANAGED`` + Non-zero if NIC is a managed-mode NIC + +``NETWORK`` + The friendly name of the network + +``NETWORK_UUID`` + The network's UUID + +``NETWORK_TAGS`` + The network's tags + +``NETWORK_IPv4_CIDR``, ``NETWORK_IPv6_CIDR`` + The subnet in CIDR notation + +``NETWORK_IPV4_GATEWAY``, ``NETWORK_IPV6_GATEWAY`` + The subnet's default gateway + +Userland interface +++++++++++++++++++ + +A new client script is introduced, ``gnt-network``, which handles +network-related configuration in Ganeti. + +Network addition/deletion +^^^^^^^^^^^^^^^^^^^^^^^^^ +:: + + gnt-network add --cidr=192.0.2.0/24 --gateway=192.0.2.1 \ + --cidr6=2001:db8:2ffc::/64 --gateway6=2001:db8:2ffc::1 \ + --nic_connectivity=bridged --host_interface=br0 public + gnt-network remove public (only allowed if no instances are using the network) + +Manual IP address reservation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +:: + + gnt-network reserve-ips public 192.0.2.2 192.0.2.10-192.0.2.20 + gnt-network release-ips public 192.0.2.3 + + +Network modification +^^^^^^^^^^^^^^^^^^^^ +:: + + gnt-network modify --cidr=192.0.2.0/25 public (only allowed if all current reservations fit in the new network) + gnt-network modify --gateway=192.0.2.126 public + gnt-network modify --host_interface=test --nic_connectivity=routed public (issues warning about instances that need to be rebooted) + gnt-network rename public public2 + + +Assignment to node groups +^^^^^^^^^^^^^^^^^^^^^^^^^ +:: + + gnt-network connect public nodegroup1 + gnt-network connect --host_interface=br1 public nodegroup2 + gnt-network disconnect public nodegroup1 (only permitted if no instances are currently using this network in the group) + +Tagging +^^^^^^^ +:: + + gnt-network add-tags public foo bar:baz + +Network listing +^^^^^^^^^^^^^^^ +:: + + gnt-network list + Name IPv4 Network IPv4 Gateway IPv6 Network IPv6 Gateway Connected to + public 192.0.2.0/24 192.0.2.1 2001:db8:dead:beef::/64 2001:db8:dead:beef::1 nodegroup1:br0 + private 10.0.1.0/24 - - - + +Network information +^^^^^^^^^^^^^^^^^^^ +:: + + gnt-network info public + Name: public + IPv4 Network: 192.0.2.0/24 + IPv4 Gateway: 192.0.2.1 + IPv6 Network: 2001:db8:dead:beef::/64 + IPv6 Gateway: 2001:db8:dead:beef::1 + Total IPv4 count: 256 + Free address count: 201 (80% free) + IPv4 pool status: XXX.........XXXXXXXXXXXXXX...XX............. + XXX..........XXX...........................X + ....XXX..........XXX.....................XXX + X: occupied .: free + Externally reserved IPv4 addresses: + 192.0.2.3, 192.0.2.22 + Connected to node groups: + default (link br0), other_group(link br1) + Used by 22 instances: + inst1 + inst2 + inst32 + .. + + +IAllocator changes +++++++++++++++++++ + +The IAllocator protocol can be made network-aware, i.e. also consider +network availability for node group selection. Networks, as well as +future shared storage pools, can be seen as constraints used to rule out +the placement on certain node groups. + +.. vim: set textwidth=72 : +.. Local Variables: +.. mode: rst +.. fill-column: 72 +.. End: diff --git a/doc/index.rst b/doc/index.rst index 75d9fd63d87178b453ec0ae69da25e2ff3e4a102..34e57bb6c90b90134ef1278a2b22f1ff0c03d674 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -22,6 +22,7 @@ Contents: design-htools-2.3.rst design-2.4.rst design-draft.rst + design-network.rst cluster-merge.rst design-shared-storage.rst locking.rst