bonding

A network bond is formed by 2 or more network interfaces (NICs) acting as a single NIC. Its advantages and the reason why they are used are two-fold:

• increased bandwidth

• redundancy

Creating a bond of two NICs of 1 Gbps bandwidth will deliver a total aggregate bandwidth of 2 Gbps. If one NIC fails, the bond and the IPs it serves will remain available but at half the bandwidth. Those 2 advantages make bonding and teaming (see next section) very common in production servers.

We can create network bonds in 4 ways:

• configuration files in /etc/sysconfig/network-scripts plus command line

• Network Manager textual user interface or nmtui

• Network Manager command line interface or nmcli

• Network Manager graphical user interface

In laptops or desktops we have a choice of which method to use. But in servers that might not be the case. Sometimes we might not be able to get an X11 session to the server (i.e. we do not have an X11 client or the X11 server is not running) and that will void the GUI option. Furthermore, in many environments Network Manager is disabled (or even uninstalled) to avoid configuration conflicts and mistakes. That will leave us with just the configuration files and CLI as the one and only choice.

Let’s configure a bond with the configuration files and CLI as an example. We start by creating the ifcfg file for the bonded or master device (see “network configuration basics” for a detailed explanation on the ifcfg parameters):

# cat /etc/sysconfig/network-scripts/ifcfg-bond0
NAME=bond0
DEVICE=bond0

TYPE=Bond
BONDING_MASTER=yes
IPADDR=192.168.122.30
PREFIX=24
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS=”miimon=100 mode=balance-rr”

And now let’s view the configuration files for the slave devices (just the relevant parameters):

# cat /etc/sysconfig/network-scripts/ifcfg-ens1
NAME=bond0-slave
DEVICE=ens1
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
.
# cat /etc/sysconfig/network-scripts/ifcfg-ens2
NAME=bond0-slave
DEVICE=ens2
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes

In simple terms, we just added the MASTER and SLAVE parameters, changed the NAME (for clarity sake) and removed any IP settings such as IPADDR, PREFIX, GATEWAY, etc. That’s it. With the configuration files ready, we can bring up the bonded channel:

# cd /etc/sysconfig/network-scripts/
# ifdown ens1
# ifdown ens2
# ifup ens1
# ifup ens2
# ifup bond0

We can create more bonded channels if necessary simply by changing the NAME (bond0 → bond1), MASTER (bond0-slave → bond1-slave) and IPADDR parameters in the new set of files.

Important: the bonding module is not loaded by the default by the kernel on startup and specifying bonding parameters in /etc/modprobe.d/*conf will not work. So for everything to work out as expected, specify the bonding parameters in BONDING_OPTS. Even if you are OK with all the default options, at least specify BONDING_OPTS=”miimon=1000″.

Let’s have a look now at the bonding options available for us to use:

mode → specifies the bonding policy and can be one of:

balance-rr or 0: transmissions are received and sent sequentially in a round-robin fashion through each bonded slave.

active-backup or 1: an active slave is chosen and only changed in case of failure.

balance-xor or 2: the slave that handles a transmission is determined by a hash of source & destination MAC addresses. This mode is only advisable when most or all traffic is from the local network.

broadcast or 3: transmissions are sent from all slaves in parallel. Used only in very specific situations!

802.3ad or 4: uses an IEEE802.3ad dynamic link aggregation policy that takes into account speed & duplex settings. It requires an 802.3ad compliant switch for it to work.

balance-tlb or 5: uses a transmit load balancing (TLB) policy that distributes outgoing traffic according to the load of each slave. The incoming traffic is handled by one slave that doesn’t change unless a failure materialises. This policy is only suitable for local addresses known to the kernel bonding module so it won-t work behind bridges or in virtual environments.

balance-alb or 6: uses an adaptive load balancing (ALB) policy that applies to both incoming and outgoing traffic. The incoming load balancing is achieved through ARP negotiation and that restricts this policy to local networks as with balance-tlb.

primary → this option specifies the slave to use as primary device and it only changes upon failure. It can only be used together with “mode=active-backup“. It makes sense to specify a primary interface when, for example, one of them offers much more bandwidth than the rest.

primary_reselect → specifies the reselection policy for the primary slave and it can have one of three values:

a) always or 0: the primary slave becomes the active one as soon as it is back up.

b) better or 1: the primary becomes the active one when it comes back up, provided it offers better speed and duplex than the current active slave.

c) failure or 2: the primary becomes the active one only upon failure of the current active slave.

ad_select → specifies the 802.3ad aggregation logic to use (obviously it can only be used with “mode=802.3ad”) and can be set to:

a) stable or 0: (default) active aggregator is chosen by the largest aggregate bandwidth and reselection only occurs when all slaves are down.

b) bandwidth or 1: chosen also by aggregate bandwidth but reselection occurs if

1. a slave is added or removed from the bond
2. any slave’s link state changes
3. any slave’s 802.3ad association state changes
4. bond is brought up

c) count or 2: chosen by largest number of slaves and reselection occurs in same cases above.

miimon → specifies in milliseconds how often MII link monitoring occurs. It is necessary to ensure high availability as MII verifies the NICs are active. To use MII on bonds every NIC’s driver must support it. We can determine if that’s the case by running…

# ethtool eth0 | grep “Link detected:”        → replace “eth0” with your NIC’s name

If the result is “Link detected: yes”, then MII is supported. If we do not know what value to set this option to, then set it to 100 so that monitoring occurs 10 times per second. Setting it to 0 effectively disables MII monitoring. If we use the miimon option, then we can skip the “arp_*” options.

downdelay → specifies in milliseconds how long to wait after failure detection before disabling a NIC. This value must be a multiple of that used in miimon. By default it is set to 0 and thus disabled.

fail_over_mac → specifies whether active-backup mode should set all slaves to the same MAC address at enslavement time. Possibles values are:

a) none or 0: (default) sets all slaves to the same MAC address at enslavement time.

b) active or 1: bond MAC address is set to that of the active slave. Backup slave’s MAC address is not changed. Upon failover, the MAC of the bond is changed to that of the new active slave. This setting is useful (or necessary) in scenarios where MAC addresses of NICs cannot be changed or incoming broadcasts from a devices own MAC are refused. When we encounter such an scenario and are forced to use this setting, we should bear in mind that upon link failure every device on the network will have to be updated via gratuitious ARP and, if the ARP update is lost, traffic will be disrupted.

c) follow or 2: the bond acquires the MAC of the active slave and the backup slaves MACs are left untouched. Upon failover, the new and former active slaves switch MACs so that the bond’s addresses doesn’t need changing. This setting is useful in scenarios where having multiple devices with the same MAC address causes problems.

lacp_rate → specifies the rate at which link partners should transmit LACPDU packets in 802.3ad mode. It can be set to:

a) slow or 0: (default) LACPDUs are sent every 30 seconds.

b) fast or 1: they’re sent once per second.

resend_igmp → specifies the number of IGMP membership reports to be issued upon a failover event. One report is sent immediately after a failover and subsequent ones are sent at 200ms intervals. The valid range of values is 0 to 255 and the default is 1. A value of 0 prevents IGMP membership reports from being sent.
This option is required to have a value of 1 or higher when the mode is set to one of balance-rr, active-backup, balance-tlb or balance-alb.

updelay → specifies in milliseconds how long to wait before enabling a link. As with downdelay, it must be set to a multiple of the miimon value. If set to 0 links will never be enabled after failure.

use_carrier → specifies whether “MII/ETHTOOL ioctls” or “netif_carrier_ok()” should be used to determine the link state. Most device drivers support the latter but the former can be used when that is not the case. If your link shows as UP when it is not, maybe you should change the value of this option from 1 (default use of netif_carrier_ok()) to 0 (use of MII/ETHTOOL ioctls).

xmit_hash_policy → selects the transmit hash policy used for slave selection in balance-xor and 802.3ad modes. It can be set to:

a) layer2 or 0: (default) uses XOR of hardware MACs to generate a hash, so all traffic from a network peer will be handled by the same slave. This setting is 802.3ad compliant.

b) layer3+4 or 1: uses upper layer protocol information to generate the hash, so traffic from a particular peer might span over multiple slaves. This setting is NOT 802.3ad compliant.

c) layer2+3 or 2: uses a combination of layer 2 and 3 protocol information to generate the hash. This setting is 802.3ad compliant.

The listed above options are some of the most often used. But if you want to have a look at all the options available at the time of reading this paragraphs, then go to:

https://www.kernel.org/doc/Documentation/networking/bonding.txt

<< network configuration basics              teaming >>