Layer-3 Forwarding « ipSpace.internet weblog


The layer-2 forwarding and flooding in an MLAG cluster are intricate however nonetheless moderately straightforward to know. Layer-3 will get extra fascinating; its quirks rely closely on layer-2 implementation. Whereas most MLAG implementations exhibit related bridging habits, count on fascinating variations in routing habits.

We’ll must broaden by-now acquainted community topology to cowl layer-3 edge instances. We’ll nonetheless work with two switches in an MLAG cluster, however we’ll have an exterior router connected to each of them. The hosts related to the switches belong to 2 subnets (crimson and blue).

Layer-3 MLAG topology

Layer-3 MLAG topology

Forwarding Necessities

Earlier than going into the main points, let’s work out what we must always count on in a well-designed MLAG cluster offering layer-3 forwarding:

  • One of many main causes to take care of MLAG complexity is redundancy – the site visitors ought to hold flowing even when one of many MLAG cluster members crashes. That’s straightforward to do inside a layer-2 phase; to maintain inter-subnet site visitors flowing, the MLAG cluster members must share the IP and MAC tackle of the first-hop gateway.
  • We would like energetic/energetic layer-3 forwarding throughout the MLAG cluster. For instance, when A sends an IP packet to B, it would use the A-S1 or the A-S2 hyperlink. It might make no sense to ship that packet over the S1-S2 peer hyperlink simply to be routed by the opposite swap. The primary-hop IP and MAC tackle should subsequently be energetic on all MLAG cluster members.
  • MLAG cluster members should take care of misdirected site visitors. In most designs, S1 and S2 promote entire subnets (crimson and blue) to the exterior router. Whereas it doesn’t matter whether or not the exterior router sends site visitors for A or B to S1 or S2, S1 and S2 must take care of site visitors for X or Y arriving on the unsuitable swap.

First-Hop IP and MAC Handle

I’m constructive the “shared first-hop IP- and MAC tackle” requirement instantly triggered the “first-hop redundancy protocols (FHRP)” knee-jerk response, however that doesn’t must be the case. Arista’s Digital ARP (VARP) or Cumulus Linux Digital Router Redundancy (VRR) – statically configured shared IP- and MAC tackle – are greater than adequate, and are fairly resilient in opposition to configuration errors.

Many different distributors insist on working HSRP or VRRP between MLAG cluster members, and Arista and Cumulus supply each choices – what may very well be higher than two methods of configuring the identical factor.

Lively/Lively Forwarding

Many distributors took historic FHRP implementations that supported a single energetic forwarder and made them a part of their MLAG options. It took years earlier than they realized it’s completely nice to have all switches hearken to the identical MAC tackle. In any case, if the MAC desk on the ingress swap forwards a layer-2 packet with the FHRP vacation spot MAC tackle to the layer-3 forwarding desk, that very same packet is just not flooded to some other host (or the peer hyperlink), and there’s no hazard of site visitors duplication.

If you wish to use active-active forwarding, flip off ICMP redirects on routed VLANs in an MLAG cluster. Extra particulars coming in one other weblog put up.

Quick ahead to 2022. Lively/energetic forwarding is now a table-stakes MLAG function, and there’s an excellent cause for that. Bear in mind the “use outbound ACL to restrict layer-2 flooding” trick? Right here’s what occurs once you attempt to use single forwarder with it:

  • Let’s assume S1 is our devoted forwarder, and S2 is only a layer-2 swap.
  • Assume A needs to ship a packet to B (which is in a distinct subnet) and occurs to ship the packet to the VRRP MAC tackle (owned by S1) over the A-S2 hyperlink.
  • S2 forwards the packet (primarily based on vacation spot MAC tackle) to S1.
  • S1 routes the packet, and tries to ship it to B.
  • The outbound ACL on the S1-B hyperlink drops the packet as a result of S1 acquired it over the peer hyperlink.

You could possibly resolve that problem with a extra particular ACL (drop the packet if it got here from the peer hyperlink and if the supply MAC is just not the router MAC), or you can do the correct factor and implement energetic/energetic forwarding.

Lastly, since VXLAN and EVPN turned all the fad, many switches assist anycast gateway that extends the shared IP- and MAC addresses throughout the entire VLAN. Isn’t it ironic that it took so lengthy for everybody to finally converge towards essentially the most easy resolution that has been identified for ages?

ARP Dealing with

I already talked about misdirected site visitors: an exterior supply may ship an IP packet to a swap that’s not but able to ahead it to a directly-connected host because of a lacking ARP entry.

Many MLAG implementations use a control-plane protocol that synchronizes the ARP tables between MLAG cluster members to take care of that problem; I nonetheless can’t work out why they must do it. In any case, the worst that may occur is one other ARP request and a dropped packet (or few).

One may suppose (primarily based on the active-active forwarding dialogue) that it’s important to synchronize ARP entries because of misdirected ARP replies – S1 sends an ARP request to A, however A replies over the A-S2 hyperlink – however that doesn’t make sense.

ARP reply is a unicast layer-2 body, and is forwarded to the management airplane ASIC port as soon as it reaches the goal swap, so it’s not hitting the LAG member outbound ACL. Moreover, ought to that be a problem, we’d have a 50% probability of getting ARP to work within the first place, and possibly a couple of sad unfortunate prospects.

It appears like most distributors determined that it doesn’t value a lot to have ARP synchronization in the event that they already applied MAC synchronization, and simply went with the move.

Gateway Supply MAC Handle

MLAG implementations should take care of one other glitch. Some gadgets (most notably storage gadgets, supposedly additionally some load balancers) construct forwarding cache entries from the supply IP- and MAC addresses of the incoming IP packets – a transparent layering violation, in all probability additionally an RFC violation.

Let’s assume B is such a tool. When A sends a packet to B over the A-S1 hyperlink, S1 routes the packet and forwards it to B with the S1 supply MAC tackle. A sane IP host would ignore the supply MAC tackle and ship the return packet to the default gateway MAC tackle (in spite of everything, A is in a distinct subnet). B – an aggressively over-optimizing system – would construct a forwarding entry from that packet saying “to ship packets to A, use S1 MAC tackle”.

Now think about that:

  • B occurs to ship the packet to A with the vacation spot MAC tackle of S1 over the B-S2 hyperlink
  • S2 would ahead (bridge) the packet to the S2-S1 peer hyperlink
  • S1 would obtain and route the packet and ahead it towards A over the direct hyperlink.
  • Primarily based on the MLAG implementation, the packet is likely to be dropped because of the outbound ACL on the S1-A LAG member hyperlink (see energetic/energetic forwarding part).

There are two methods to resolve this conundrum:

  • The academically right means: use switch-specific MAC addresses for native site visitors, and ship all forwarded site visitors with the shared supply MAC tackle. Regardless of the loopy hosts determine to do can’t break active-active forwarding.
  • The standard kludge: make each switches course of site visitors despatched to the shared MAC tackle and the native MAC addresses of each switches. This strategy ensures no control-plane protocol (together with ARP) will ever work out of the field, and requires tons of different kludges to make easy issues like ARP work; the proof is left as an train for the reader.

Why would anybody implement such a loopy kludge? Wouldn’t or not it’s simpler to ship forwarded packets and locally-originated packets with totally different supply MAC addresses? In any case, some community working programs (like Cisco IOS) ceaselessly used totally different forwarding paths for native and forwarded site visitors.

It appears like most information heart swap distributors discovered it simpler to push the locally-originated site visitors to the switching ASIC and let it take care of the entire forwarding course of (together with outbound ACL). When utilizing such an strategy, the swap ASIC makes use of the identical forwarding path for native site visitors and site visitors acquired on exterior ports, leading to the identical supply MAC tackle after the layer-3 forwarding course of rewrites the MAC header.

Thank You

There have been many issues that made little sense after I wrote the primary draft of the weblog put up. An extended chat with Dinesh Dutt cleared the MLAG fog, and rapidly every little thing made (some bizarre) sense. Dinesh, thanks one million to your time and the endurance to assist me determine all of it out.

It goes with out saying that every one the errors left within the weblog put up are mine 😉

What’s Subsequent?

We coated the fundamentals of layer-2 and layer-3 forwarding in an MLAG cluster. Time for extra fascinating matters, beginning with “how can we combine MLAG with VXLAN?” – the subject of the subsequent weblog put up within the MLAG Deep Dive collection.


Supply hyperlink

Leave a Reply

Your email address will not be published.