IPsecHub+. High Availability and Dynamic Routing

15:45
27.05.2025
Gidral88
115

Hello everyone! Nikolay Edomsky here, Head of the Network Engineering Group at EDINOM TSUPIS.

I present to your attention the fifth article in the series "IPsecHub+".

In the previous articles, we explored various functions of our IPsec concentrator. However, one important question was left out, which is crucial for any serious topology — the question of fault tolerance. In this article, we will explore ways to make our concentrator fault-tolerant.

Dynamic Routing

Now, what kind of fault tolerance would be complete without dynamic routing, you might ask. And you would be right. In all the previous examples, we have considered the use of only static routes. Naturally, in an enterprise setting, such a configuration is unfeasible — I think it’s unnecessary to elaborate on why dynamic routing should still be implemented.

Implementing any dynamic routing protocol on our IPsec concentrator will be straightforward, as the topology is based on full-fledged logical interfaces — GRE, VTI, veth, and VLAN. I suggest considering the implementation of the BGP protocol on the concentrator, as this protocol will provide the most flexible configuration.

All we need to do is assign an autonomous system to each of our VRFs and create a BGP connection between all participating interfaces.

Here’s how it will look:

Fig. 1. Prefix distribution points via BGP.

The diagram shows that we have connected all the nodes along the traffic path with BGP peering. The general concept of dynamic routing is as follows:

ipsecFROM sends the DC routes to the target VRFs of the branches
ipsecTO receives branch routes from their target VRFs.
The DC routes reach the branches via GRE from the target VRF.
The firewall still acts as an inter-VRF router.

The direction of prefix distribution is shown in the diagram with arrows. Additionally, each VRF should be assigned a separate AS, which will help avoid many issues related to AS loop and confusion with AD, as iBGP and eBGP have different AD values.

And most importantly — always remember the key rules of dynamic routing in an escalator topology.

Read also:

ESP32 programming with ESP-IDF in the PlatformIO environment #0

Escalators to the branches only push traffic to the specific branch and nowhere else.
Escalators in the DC only lift traffic to the firewall and nowhere else.

It is crucial to be very strict about which prefixes you send and receive on ipsecTO and ipsecFROM. The entire scheme could break if, for example, you send branch routes on ipsecFROM. The direction of distribution should be strictly as shown in the diagram below. We send the branch prefixes via ipsecTO, and the DC prefixes via ipsecFROM.

The situation shown in the image below is unacceptable. A route from the green branch VRF through VRF ipsecFROM leaks into the red branch VRF. In this case, traffic between the branches on the specified path will not be returned to the firewall. This would ruin the scheme, as it would create asymmetric traffic passing through the firewall.

Fault Tolerance

After we implemented BGP on our concentrator, we can discuss the basic fault tolerance of our solution. Fault tolerance will mainly consist of hardware redundancy for our concentrators.

We will discuss redundancy at the branches a little later.

What are the main tasks of redundancy?

We need to provide redundancy for the traffic acceptance point from the firewall. Since the firewall operates only with static routing, we need to present it with a point that will always be able to handle traffic to the branches. In the simplest case, this could be a VRRP domain (floating IP address).
We need to provide redundancy for the traffic acceptance point from the branches. For this, on the branch router, we add an additional GRE tunnel to the second data center IPsec concentrator.

Let’s place another IPsec concentrator and integrate it into our topology. It will be enough to clone the configuration of the already existing environment. The only changes needed will be the external address of the concentrator where the IPsec tunnel terminates, and the endpoints of the GRE tunnel with the corresponding GRE interface addresses. The veth interface addresses can remain unchanged, since they are not involved in interacting with hosts outside the internal loop of the concentrator.

Let’s see what this scheme would look like. Dear readers, please click on the image for more comfortable viewing, as we are already dealing with extremely complex scenarios, and the schematics for such scenarios, unfortunately, turn out to be quite cumbersome.

Fig. 3. Fault tolerance of concentrators.

We see that in the ipsecTO VRF we have organized a simple VRRP domain with a floating address. The static route to branch networks on the firewall will point to this address.

BGP peering in ipsecTO.

There is a very important point in such a configuration. Note that we also established BGP peering between the interfaces of the IPsec concentrators in the ipsecTO VRF. Why was this done?

If the IPsec concentrator loses the IPsec tunnel to a branch while being the VRRP master, it will not be able to route packets going to the branch, since it will no longer receive routes through the GRE tunnel. To prevent this, we established BGP peering between both IPsec concentrators. If at least one tunnel to a branch is operational, packets will be routed correctly. The VRRP master will receive the route to the branch through the neighboring IPsec concentrator if its own tunnel fails for any reason.

Fault tolerance and NAT

Read also:

ESP32 programming with ESP-IDF in the PlatformIO environment #0

The topology becomes much more complex if we are providing redundancy for schemes involving NAT. The main issue is that when there is an alternate route to the data center and to the branches, there is a risk of requests going through one IPsec concentrator and responses coming back through another. An example of such a request can be seen in the diagram below.

Fig. 4. The problem of asymmetry with NAT.

Since the NAT record will only be created on the hub that receives the request, we must also route the response through it. This can be done in different ways, but the most reliable approach is as follows.

Terminate requests on one IPsec hub with one network, and on the other IPsec hub with another.

We add a new surrogate network, which will be used to terminate connections from the data center through the second hub—100.65.0.0/24. We configure the corresponding NAT rules on the second IPsec hub:

Fig. 5. Closing the right data center with a separate surrogate subnet.

The configuration for routes in the overall scheme will be as follows. The main thing is to ensure the correct traffic direction to the surrogate replacement prefixes.

The dynamic routing configuration for the NAT scheme will look like this. We will need to send the corresponding covering networks to the branch from each hub.

Summary

Let’s sum up this iteration of building our scheme. Which requirements for our topology have we met this time?

~~Traffic between branches must pass through a centralized firewall.~~
~~Encrypted traffic between the data center and branches must pass through a centralized firewall.~~
~~Tunnels must terminate on one IPsec hub.~~
~~The solution should be technically flexible and allow for different tunnel types in various configurations (VTI, GRE…).~~
~~The solution should be flexibly managed.~~
~~The solution must support dynamic routing.~~

The requirement is met. We implemented dynamic routing in our scheme using the BGP protocol.

~~The solution must not involve the firewall in dynamic routing.~~

The requirement is met. We implemented dynamic routing without involving the firewall in the BGP process.

~~The solution must be fault tolerant.~~

This one’s covered too. Our solution can now be considered fault tolerant. We have provided a hardware backup node for the IPsec hub.

The solution must not be proprietary.
The solution should be scalable.

We only have two points left. We’ll cover them in the next article in the series.

Thank you for your attention, and see you next time!

IPsecHub+. High Availability and Dynamic Routing

Hello everyone! Nikolay Edomsky here, Head of the Network Engineering Group at EDINOM TSUPIS.

Write comment

Relevant news on the topic "Network"

ZeroTier Magic: Creating a Personal VPN Network for Home and Cloud in 20 Minutes

Usenet – what existed before the web

Universe of Network Games in Unity: A Guide for the Young Creator

Decentralized hosting/data storage systems

What to read for a system administrator — authors who write about protocols

Is external SSL offloading necessary if Intel has already integrated it directly into the CPU?

Traffic Generator Cisco TRex. Review

Also read

Programming ESP32 with ESP-IDF in PlatformIO environment #1

The first large-scale MCP security study: what the analysis of 1,899 publicly accessible servers revealed

About how I made my small microcomputer

Need for speed: assessment of measurement errors in football analytics. Part 1