Is external SSL offloading necessary if Intel has already integrated it directly into the CPU?

18:36
17.06.2025
DSol
338

HTTPS has become the standard — and with it, the load on infrastructure has increased. Traffic decryption puts a serious strain on the CPU, which is why many still use external SSL accelerators with ASIC chips. But the release of 4th-generation Intel Xeon server processors has changed everything.

Today, HTTPS is not a recommendation but a de facto standard. It’s a secure version of HTTP, built on top of the TLS protocol. It protects users from data interception, man-in-the-middle attacks, and a myriad of other threats.

This is a huge step forward for internet security — but it comes at a price. Infrastructure costs have risen: backend servers are finding it increasingly difficult to cope with the volume and complexity of encrypted traffic. The main reasons are:

Explosive growth of HTTPS. Where TLS used to be applied only to login or payment pages, now literally everything is encrypted. According to the Google Transparency Report, in 2024 the share of HTTPS in most browsers and regions has exceeded 90%.
Increasingly complex cryptography. While TLS 1.0 and 1.1 were relatively light on the CPU, starting with TLS 1.2, and especially TLS 1.3, more resource-intensive mechanisms have come into play: mandatory key exchange via ECDHE or X25519, the abandonment of session RSA in favor of forward secrecy algorithms, and widespread use of AEAD ciphers (for example, AES-GCM and ChaCha20-Poly1305), which simultaneously encrypt and check data integrity.
More connections per user. Modern web applications constantly exchange data with the server: background updates, push notifications, WebSocket channels. This increases the number of TLS sessions and HTTPS requests from a single client.

What does this mean in practice?

Switching from HTTP to HTTPS can increase backend server CPU load by 40–80% — especially with thousands of simultaneous connections, short sessions, and the use of RSA certificates. This load scales with traffic growth and can become a bottleneck.

The answer: SSL offloading to external devices

To relieve backend servers, cryptographic processing began to be moved to external nodes such as L7 load balancers. This is where TLS termination happens: HTTPS traffic is decrypted, and backend services receive it as standard HTTP.

Technically, the correct term is TLS offloading — modern protocols have long moved on from the outdated SSL. However, the term “SSL offloading” has become entrenched in the industry and is still used out of habit.

In hardware L7 load balancers, such as F5 BIG-IP, Citrix Netscaler, or Kemp LoadMaster, specialized ASIC accelerators are used to process HTTPS traffic — chips separate from the CPU, developed specifically for fast cryptographic processing and reducing the load on the main processor.

With advances in Intel technology, processing HTTPS traffic has become efficient directly on the CPU

For a long time, it was believed that encrypted traffic could only be efficiently processed with specialized external devices. However, with the development of server processors, this notion is becoming outdated.

With the release of 4th-generation Intel Xeon processors (Sapphire Rapids), cryptographic acceleration is now available directly inside the CPU. Now, encrypted traffic can be processed directly on the processor — in hardware, quickly, and with no performance compromises.

Intel QAT: from external accelerator to a built-in CPU feature

Intel QuickAssist Technology (Intel QAT) is a hardware accelerator designed to perform resource-intensive cryptographic operations: from processing HTTPS and IPsec to VPNs and other secure connections. It reduces the load on CPU cores and significantly increases system throughput.

QAT’s history began back in 2007 as separate PCIe cards (such as the Intel 8950), which were used as external SSL accelerators in high-load systems. Later, the technology moved into chipsets (for example, Intel C62x), and with the advent of 4th generation Xeon processors, it became fully integrated directly into the CPU.

Advantages of Intel QAT

Broad compatibility. Supports all major OSes (Linux, Windows Server, FreeBSD), hypervisors (KVM, VMware, Hyper-V), and popular frameworks: OpenSSL, NGINX, HAProxy, and others.
Seamless integration. QAT is connected via standard OpenSSL interfaces — without requiring significant changes to application code.
Modern cryptography. Supports current algorithms: AES-GCM, ChaCha20, RSA, ECC, ECDSA, and others. The full list is available in the official documentation on GitHub.

Examples of performance improvement with Intel QAT in practice

The effectiveness of Intel QAT in real-world scenarios has been confirmed by official tests and industry benchmarks. The full list is available on the Intel Performance Benchmarks page.

One of the most illustrative cases is the performance increase of the NGINX web server when handling TLS handshakes. This is the stage of establishing a secure connection between the client and the server, during which key exchange, cipher selection, and authentication occur. All these operations are resource-intensive and put a noticeable load on the CPU.

Test configuration:

Processor: Intel® Xeon® Platinum 8470N (Sapphire Rapids, 4th generation Xeon)
Ciphersuite: TLS 1.3, ECDHE-X25519, RSA-2048

For reference: A ciphersuite is a set of algorithms used in HTTPS for key exchange, encryption, signatures, and data integrity verification.

The test used a modern and widely used suite based on TLS 1.3:

ECDHE-X25519 — a key exchange algorithm based on elliptic curves. It provides forward secrecy: even if the server’s private key becomes available, traffic from previous sessions remains protected.
RSA-2048 was used for digital signatures, confirming the server’s authenticity during connection establishment.

Results with Intel QAT enabled:

Details can be found in the Intel QuickAssist Technology (QAT) – NGINX Performance white paper.

Comparison with classic offloaders

Intel also conducted joint tests with L7 load balancers F5 and Citrix NetScaler. Results showed that using Intel QAT even in the software versions of these solutions provides a performance level comparable to hardware accelerators built into specialized devices.

Conclusion

Real-world tests show: tasks that previously required separate hardware offloaders can now be efficiently performed directly on the CPU — with only a slight increase in load. Intel QAT technology is changing the very approach to processing HTTPS traffic.

In architectures with L7 load balancers, the need for expensive ASIC accelerators is eliminated — SSL offloading can be performed at the CPU level, reducing costs and simplifying infrastructure.

With the transition to modern server processors, in architectures with L4 load balancers, the model of distributed TLS termination becomes relevant again — where HTTPS traffic decryption occurs directly on the server. This approach provides several important advantages:

Compliance with Zero Trust principles. Traffic is transmitted in an encrypted form from the client to the application, which eliminates centralized trust points and minimizes interception risks.
Flexible security configuration. The server receives complete information about the client's parameters — including the crypto suite and TLS options, allowing encryption policies to be adapted to specific requests.
Improved fault tolerance. TLS sessions are established directly between the client and the server, so when switching to a backup load balancer, the connection is maintained — without interruptions.

Thus, Intel QAT makes the distributed model not only possible but also effective: now the choice of architecture depends not on technical limitations, but solely on business needs.

What about you?

Share in the comments how you handle encrypted traffic in your infrastructure. Do you use external devices, centralize TLS termination, or distribute it closer to applications?

If you are interested in performance balancing and want more practical insights and case studies — join our Telegram channel (t.me/dsproxima). There we share experiences, architectural solutions, and real cases on working with L4 load balancers.