Ask Our Experts
Project Solutions & Tech.
Get Advice: Live Chat | +852-63593631

Unveiling the Evolution of NVIDIA NVLink Technology

author
Network Switches
IT Hardware Experts
author https://network-switch.com/pages/about-us

For more than a decade, NVIDIA has been pushing the limits of GPU-to-GPU communication. As deep learning, high-performance computing (HPC), and large-scale simulations demand ever-greater parallelism, the bandwidth bottleneck of PCIe became a critical roadblock.

To address this, NVIDIA introduced NVLink, a proprietary high-speed interconnect designed to deliver low latency, high bandwidth, and efficient GPU scaling.

This article explores the evolution of NVLink from its first release in 2014 to its fourth generation, compares it with PCIe, explains the role of NVSwitch, and analyzes its role in AI, HPC, and the future of optical interconnects.

NVIDIA NVLink Technology

The Bandwidth Bottleneck of PCIe

Peripheral Component Interconnect Express (PCIe) has been the standard bus connecting GPUs and CPUs for decades. While PCIe has advanced from Gen 1 (2.5 Gbps per lane) to Gen 6 (64 Gbps per lane, PAM4), its scaling could not keep pace with the exponential growth of GPU performance.

For instance, PCIe 3.0 x16 delivers ~16 GB/s bandwidth, but modern GPUs can consume data at hundreds of GB/s. This mismatch created a bottleneck in multi-GPU systems, where inter-GPU communication became a limiting factor.

NVLink is NVIDIA’s proprietary point-to-point interconnect that bypasses PCIe switches and CPU scheduling, allowing GPUs to communicate directly with bidirectional bandwidths up to 900 GB/s (NVLink 4.0).

Key design goals of NVLink:

  • High bandwidth: Multiplying effective GPU interconnect capacity.
  • Low latency: Shorter data paths compared to PCIe switching.
  • Scalability: Enabling efficient GPU meshes in DGX systems.

NVIDIA has released four generations of NVLink, each doubling (or nearly doubling) bandwidth.

NVLink Generations and Performance Improvements
  • Implemented on Pascal GPUs (P100).
  • 4 links per GPU, each with 8 lanes.
  • Lane speed: 20 Gbps.
  • Total bidirectional bandwidth: 160 GB/s, ~5× faster than PCIe 3.0 x16.
  • Supported DGX-1, with up to 8 GPUs in a cube mesh topology (not fully connected).
  • Released with Volta GPUs (V100).
  • 6 links per GPU, lane speed 25 Gbps.
  • Total bidirectional bandwidth: 300 GB/s.
  • Introduction of NVSwitch 1.0 (18 ports, 50 GB/s per port).
  • Enabled DGX-2, with 16 fully interconnected GPUs.
  • Introduced with Ampere GPUs (A100).
  • 12 links per GPU, each link with 4 lanes at 50 Gbps.
  • Total bidirectional bandwidth: 600 GB/s.
  • NVSwitch upgraded to 36 ports.
  • Powered DGX A100, connecting 8 GPUs with 4 NVSwitches.
  • Implemented on Hopper GPUs (H100).
  • 18 links per GPU, 2 lanes each, at 100 Gbps PAM4.
  • Total bidirectional bandwidth: 900 GB/s.
  • NVSwitch 3.0 with 64 ports.
  • DGX H100 architecture: 8 GPUs, 4 NVSwitches, plus integration with 800G optical modules for non-blocking fabric.

From generation to generation, NVLink consistently outpaces PCIe in both per-lane throughput and aggregate bandwidth.

Bandwidth Per Generation

Generation NVLink (GB/s bidirectional) PCIe x16 (GB/s)
NVLink 1.0 (2014) 160 PCIe 3.0: 15.8
NVLink 2.0 (2017) 300 PCIe 4.0: 31.5
NVLink 3.0 (2020) 600 PCIe 5.0: 63.0
NVLink 4.0 (2022) 900 PCIe 6.0: 121

Latency and Scalability

  • PCIe relies on switch hierarchies and CPU scheduling → higher latency.
  • NVLink provides direct GPU-to-GPU lanes and, with NVSwitch, achieves full connectivity for 8–16 GPUs without CPU involvement.
  • AI training: Faster gradient synchronization in large LLMs.
  • HPC: Tight GPU coupling in scientific workloads.
  • Simulation & rendering: Higher throughput in multi-GPU rendering engines.

How NVSwitch Works?

NVSwitch acts as a switching fabric for GPUs, ensuring every GPU in a system can communicate with every other GPU at full NVLink speed.

  • NVSwitch 1.0: 18 ports, 50 GB/s each.
  • NVSwitch 2.0: 36 ports, 50 GB/s each.
  • NVSwitch 3.0: 64 ports, supporting 800G optical interconnects.

DGX Systems Overview

System GPU Model GPUs per System NVSwitch Version Bandwidth per GPU
DGX-1 P100 8 None 160 GB/s
DGX-2 V100 16 NVSwitch 1.0 300 GB/s
DGX A100 A100 8 NVSwitch 2.0 600 GB/s
DGX H100 H100 8 NVSwitch 3.0 900 GB/s
The Role of NVSwitch in Scaling NVLink

NVIDIA has been exploring optical interconnects:

  • Embedding silicon photonics next to GPUs.
  • Connecting GPUs via optical fibers for long-distance, high-bandwidth scaling.
  • Potential for AI superclusters beyond 256 GPUs.

Integration with InfiniBand and SHARP

Since acquiring Mellanox, NVIDIA is combining NVLink + InfiniBand technologies:

  • External NVSwitch chips with SHARP (Scalable Hierarchical Aggregation and Reduction Protocol).
  • Enables network-level GPU collectives, reducing bottlenecks in AI and HPC clusters.
  • AI Training at Scale: LLMs like GPT, BERT, and diffusion models require thousands of GPUs. NVLink minimizes communication overhead.
  • High-Performance Computing: Weather prediction, molecular dynamics, and quantum simulations benefit from lower latency inter-GPU transfers.
  • Cloud and Data Centers: Multi-tenant AI workloads rely on NVSwitch-based fabrics for GPU virtualization.
  • Financial Services: Faster GPU analytics in real-time trading systems.

Frequently Asked Questions (FAQ)

Q1: What is NVLink and how does it work?
A: NVLink is a point-to-point interconnect that links GPUs (and CPUs) with much higher bandwidth and lower latency than PCIe.

Q2: How is NVLink different from PCIe?
A: PCIe is a general-purpose bus with lower bandwidth and higher latency. NVLink is specialized for GPU scaling, offering up to 900 GB/s bandwidth in NVLink 4.0.

Q3: Which NVIDIA GPUs support NVLink?
A: Pascal (P100), Volta (V100), Ampere (A100), and Hopper (H100) all feature NVLink support.

Q4: What is NVSwitch and why is it important?
A: NVSwitch is a switch fabric that allows every GPU in a system to be fully connected at NVLink speeds, enabling scalable DGX systems.

Q5: What is the future of NVLink in AI computing?
A: Future NVLink generations may incorporate optical interconnects, supporting massive AI clusters with tens of thousands of GPUs.

Conclusion

Over four generations, NVIDIA NVLink has redefined GPU interconnects, consistently outpacing PCIe in both bandwidth and scalability. NVSwitch has enabled fully connected GPU meshes in DGX systems, while future advancements in optical NVLink and InfiniBand integration may extend scalability to entire AI superclusters.

For enterprises building AI and HPC infrastructure, NVLink is not just an NVIDIA innovation—it is the backbone of modern GPU computing.

Did this article help you or not? Tell us on Facebook and LinkedIn . We’d love to hear from you!

Related post
View all

Make Inquiry Today