DMVPN DualHub EIGRP Traffic Engineering

With the advance of vDSL, Fiber, cable Internet and the appropriate SLA’s bussiness Internet connections have become increasingly reliable. By choosing the local ISP’s carefully it is much more interesting for a company to replace the MPLS connections for an Overlay network based on redundant Internet connections. As a result businesses quite often obtain a higher speed connection for much lower rates. One of the business cases I made in 2006/2007 had a 70% decrease in annual costs compared to their European WAN line based on an MPLS service provider including High Availability.
These kind of overlay networks are quite often based on a hub-spoke technology (each spoke registers themselves on the hub router; the configuration of the spoke is relatively easy and adaptable for each site while the hub config doesn’t change). This concept is very suitable for scalability. With certain technologies spoke-spoke traffic is handled automatically. The Cisco technology for this is called DMVPN (Dynamic Multipoint Virtual Private Network) and is used intensively in their IWAN solution. The most common routing protocol for these topologies is EIGRP for the dynamic routing and fast convergence.

Things get interesting when you start to use a dual-hub topology, so that a branch office can use two seperate Internet connections. In these situations it can happen that WAN-traffic is flowing over a flapping or unreliable Internet connection, or that traffic is going over the backup-line which is much slower than the primary connection. In this post I will explain why EIGRP is doing this and how you can change that behaviour within EIGRP.

Network Topology

The diagram below is a very common topology for a DMVPN based on a dual-hub / dual ISP solution. Each Hub is connected to a separate ISP (redundancy so that an ISP failure will not result in extra connectivity problems) and a branch office also has two ISP connections for the same redundancy.
Traffic from the WAN connections are terminated to a DMZ interface on a firewall, so that traffic can be inspected before it’s getting into the datacenter.
This setup results in the situation that in case of the failure of one ISP, the WAN traffic will flow via the other ISP connection.

Problem

If the two Internet connections from the spoke are identical (with regards to bandwidth and speed) then there is not really a problem; both DMVPN tunnels act in an active/active topology where traffic (ingress and egress) is load balanced. This is quite clear from the output of command  “show ip route eigrp” on the spoke router. It is clear that the main office ( 10.0.1.0/24) is reachable via both hub-routers.

 spoke1#sh ip route eigrp
  Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
  D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
  N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
  E1 - OSPF external type 1, E2 - OSPF external type 2
  i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
  ia - IS-IS inter area, * - candidate default, U - per-user static route
  o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
  a - application route
  + - replicated route, % - next hop override, p - overrides from PfR
  Gateway of last resort is 3.2.2.1 to network 0.0.0.0
  10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
  D        10.0.0.0/24 [90/26880512] via 10.255.2.1, 00:01:33, Tunnel2
  [90/26880512] via 10.255.1.1, 00:01:33, Tunnel1
  D        10.255.0.0/24 [90/26880256] via 10.255.2.1, 00:03:49, Tunnel2
  [90/26880256] via 10.255.1.1, 00:03:49, Tunnel1
  spoke1#

On the ASA , it is also visible that the spoke is reachable via two destinations. Bear in mind that the asa doesn’t do packet-switched loadbalancing but flow-based (e.g. per connection and not per packet like a router does)

asa-hq# sh route eigrp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, + - replicated route
Gateway of last resort is 2.2.8.1 to network 0.0.0.0
D        10.10.1.0 255.255.255.0 [90/1306112] via 10.255.0.24, 00:00:03, dmz
                                 [90/1306112] via 10.255.0.23, 00:00:03, dmz
D        10.255.1.0 255.255.255.0 [90/1305856] via 10.255.0.23, 00:00:03, dmz
D        10.255.2.0 255.255.255.0 [90/1305856] via 10.255.0.24, 00:00:03, dmz

However, a problem will occur when the second Internet connection on the spoke is the backup connection with a lower speed (for example a 4G connection). Let’s change the bandwidth of tunnel2 (backup) to a lower speed, so that traffic from the branche office will go via the primary connection (tunnel1)

configure terminal
interface tunnel2
bandwidth 4096
!
end
clear ip eigrp neighb

And now traffic is only flowing via Tunnel1 as the output demonstrates

spoke1#sh ip route eigrp
 Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
 a - application route
 + - replicated route, % - next hop override, p - overrides from PfR
 Gateway of last resort is 3.2.2.1 to network 0.0.0.0
 10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
 D        10.0.0.0/24 [90/1536512] via 10.255.1.1, 00:00:04, Tunnel1
 D        10.255.0.0/24 [90/1536256] via 10.255.1.1, 00:00:04, Tunnel1
 spoke1#

However, the problem is not at the branche, it is at the hub. At the ASA-HQ traffic for the branch (1.0.10.1.0/24) is still being directed to both hubs!

asa-hq# show route eigrp
 Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
 D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
 N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
 E1 - OSPF external type 1, E2 - OSPF external type 2
 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
 ia - IS-IS inter area, * - candidate default, U - per-user static route
 o - ODR, P - periodic downloaded static route, + - replicated route
 Gateway of last resort is 2.2.8.1 to network 0.0.0.0
 D        10.10.1.0 255.255.255.0 [90/1306112] via 10.255.0.24, 00:08:25, dmz
 [90/1306112] via 10.255.0.23, 00:08:25, dmz
 D        10.255.1.0 255.255.255.0 [90/1305856] via 10.255.0.23, 00:09:13, dmz
 D        10.255.2.0 255.255.255.0 [90/1305856] via 10.255.0.24, 00:09:13, dmz

So, traffic from HQ to the branch is still sent over the slower link, or in case of a problem on the Internet via an unreliable path. And as a result the WAN connection has become unreliable, even we have redundancy.

The cause of this problem lies within EIGRP self. EIGRP is a very nice (and now open) protocol that is both link-state and distance-vector protocol. The bandwidth of an interface is one of the many metrics that is used to determine what the most optimal path is for a destination. And that is, in fact, also quite logical, as the bandwidth for the branch office is (from the perspective of the ASA-HQ) is the same because both hub1 and hub2 are connected with the same bandwidth. This results in that the cost for 10.10.1.0/24 via both paths are the same. In summary, the ASA doesn’t “know” that tunnel2 for that specific branch is slower. The following output shows that the metrics with EIGRP are the same for both hubs.

asa-hq# show eigrp top
 EIGRP-IPv4 Topology Table for AS(1906)/ID(10.255.0.1)
 Codes: P - Passive, A - Active, U - Update, Q - Query, R - Reply,
 r - reply Status, s - sia Status

P 10.255.2.0 255.255.255.0, 1 successors, FD is 1305856
 via 10.255.0.24 (1305856/1305600), dmz
 P 10.0.0.0 255.255.255.0, 1 successors, FD is 2816
 via Connected, inside
 P 10.255.1.0 255.255.255.0, 1 successors, FD is 1305856
 via 10.255.0.23 (1305856/1305600), dmz
 P 10.255.0.0 255.255.255.0, 1 successors, FD is 2816
 via Connected, dmz
 P 10.10.1.0 255.255.255.0, 2 successors, FD is 1306112
 via 10.255.0.23 (1306112/1305856), dmz
 via 10.255.0.24 (1306112/1305856), dmz
 asa-hq#

It is of course possible to lower the bandwidth on hub2, so that traffic is preferred over hub1. This is of course possible if there are no problems on a spoke connected to hub1 and needs to be directed via hub2.. Or when there are hundreds of spokes connected to hub2.

Solution

It is still posible to inform the ASA that the traffic for a specific branch office is directed via a specific hub router. As the bandwidth of the hub router cannot be changed, another metric needs to be changed for specific destinations to enforce a different preferred path while keeping the flexibility and redundancy within EIGRP.

For this we will use the administrative distance in combination with an access-list. As known, Cisco uses administrative distances per routing-protocol to put specific routes in the RIB. For EIGRP the administrative distance is 90 for internal routes and 170 for external routes (routes that are injected into EIGRP).

In this topology all routes to the spokes are internall and have a distance of 90.

In Cisco IOS you can use the command distance to change the networks that are reachable via specific IP-subnets (the Tunnel interface on which the Spokes connect). By attaching a standard access-list to this command, only those networks matched to the access-list on that specific interface will change their distance.

The configuration on IOS would then be:

access-list 99 permit 127.0.0.1
router eigrp 1906
  network 10.255.0.0 0.0.0.255
  network 10.255.1.0 0.0.0.255
  network 10.255.2.0 0.0.0.255
  passive-interface default
  no passive-interface Tunnel1
  no passive-interface GigabitEthernet0/2
  distance 95 10.255.1.0 0.0.0.255 99
!

The standard access-list 99 is used to match traffic. By adding the address 127.0.0.1 it is ensured that access-list 99 remains in the configuration, even if we don’t want to do traffic engineering.
Within the EIGRP configuration, the command  distance 95 10.255.1.0 0.0.0.255 99 is added. This basically tells the router that all networks that are matched on access-list 99 and reachable via IP network 10.255.1.0/24 the distance needs to be set to 95.

When you now, dynamically, add the network of a spoke to the hub that needs to become the backup router and clear that specific neighbor, the distance will be changed to 95. And as a result the routing to ASA-HQ is changed as well, so the ASA is informed that this specific network is available at cost 95.
In this example I want to route traffic over hub1, so I need to change hub2 with the following config:

hub2#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
hub2(config)#access-list 99 permit 10.10.1.0 0.0.0.255
hub2(config)#exit
*Jan 26 14:39:36.008: %SYS-5-CONFIG_I: Configured from console by admin on c                             
hub2#
hub2#clear ip eigrp neighb 10.255.2.11
hub2#

And when checking the EIGRP routes, traffic is now preferred over hub1, even already from hub2!

hub2#sh ip route eigrp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR
 
Gateway of last resort is 4.4.11.1 to network 0.0.0.0
 
      10.0.0.0/8 is variably subnetted, 7 subnets, 2 masks
D        10.0.0.0/24 [90/3072] via 10.255.0.1, 00:03:15, GigabitEthernet0/2
D        10.10.1.0/24
           [90/1306112] via 10.255.0.23, 00:03:15, GigabitEthernet0/2
D        10.255.1.0/24
           [90/1305856] via 10.255.0.23, 00:03:15, GigabitEthernet0/2
hub2#

The route via the DMVPN tunnel is of course still valid, but has become more expensive and thus not installed in the RIB.

hub2#show ip eigrp top 10.10.1.0/24
EIGRP-IPv4 Topology Entry for AS(1906)/ID(10.255.2.1) for 10.10.1.0/24
  State is Passive, Query origin flag is 1, 1 Successor(s), FD is 1306112
  Descriptor Blocks:
  10.255.0.23 (GigabitEthernet0/2), from 10.255.0.23, Send flag is 0x0
      Composite metric is (1306112/1305856), route is Internal
      Vector metric:
        Minimum bandwidth is 100000 Kbit
        Total delay is 50020 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1460
        Hop count is 2
        Originating router is 192.168.2.1
  10.255.2.11 (Tunnel2), from 10.255.2.11, Send flag is 0x0
      Composite metric is (1305856/2816), route is Internal
      Vector metric:
        Minimum bandwidth is 100000 Kbit
        Total delay is 50010 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1460
        Hop count is 1
        Originating router is 192.168.2.1

And the ASA will also route the branch office (10.10.1.0/24) via hub1 :

asa-hq# sh route
 
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, + - replicated route
Gateway of last resort is 2.2.8.1 to network 0.0.0.0
 
S*    0.0.0.0 0.0.0.0 [1/0] via 2.2.8.1, outside
C        2.2.8.0 255.255.255.248 is directly connected, outside
L        2.2.8.3 255.255.255.255 is directly connected, outside
C        10.0.0.0 255.255.255.0 is directly connected, inside
L        10.0.0.1 255.255.255.255 is directly connected, inside
D        10.10.1.0 255.255.255.0 [90/1306112] via 10.255.0.23, 00:01:41, dmz
C        10.255.0.0 255.255.255.0 is directly connected, dmz
L        10.255.0.1 255.255.255.255 is directly connected, dmz
D        10.255.1.0 255.255.255.0 [90/1305856] via 10.255.0.23, 00:03:15, dmz
D        10.255.2.0 255.255.255.0 [90/1305856] via 10.255.0.24, 00:02:40, dmz
 
asa-hq#

With this a possible complex situation is easily fixed. If there’s a feel that a specific WAN connection for a specific branch office is acting up, with just adding / removing the network on the access-list you can force the traffic to a more specific path.
I’ve used this method quite regulary to determine if a provider is having packet loss, or latency, out-of-order packets, or other possible network related problems.

Share this

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.