For more than a decade, NVIDIA has been pushing the limits of GPU-to-GPU communication. As deep learning, high-performance computing (HPC), and large-scale simulations demand ever-greater parallelism, the bandwidth bottleneck of PCIe became a critical roadblock.
To address this, NVIDIA introduced NVLink, a proprietary high-speed interconnect designed to deliver low latency, high bandwidth, and efficient GPU scaling.
This article explores the evolution of NVLink from its first release in 2014 to its fourth generation, compares it with PCIe, explains the role of NVSwitch, and analyzes its role in AI, HPC, and the future of optical interconnects.

What is NVLink and Why it Matters?
The Bandwidth Bottleneck of PCIe
Peripheral Component Interconnect Express (PCIe) has been the standard bus connecting GPUs and CPUs for decades. While PCIe has advanced from Gen 1 (2.5 Gbps per lane) to Gen 6 (64 Gbps per lane, PAM4), its scaling could not keep pace with the exponential growth of GPU performance.
For instance, PCIe 3.0 x16 delivers ~16 GB/s bandwidth, but modern GPUs can consume data at hundreds of GB/s. This mismatch created a bottleneck in multi-GPU systems, where inter-GPU communication became a limiting factor.
Introduction to NVLink
NVLink is NVIDIA’s proprietary point-to-point interconnect that bypasses PCIe switches and CPU scheduling, allowing GPUs to communicate directly with bidirectional bandwidths up to 900 GB/s (NVLink 4.0).
Key design goals of NVLink:
- High bandwidth: Multiplying effective GPU interconnect capacity.
- Low latency: Shorter data paths compared to PCIe switching.
- Scalability: Enabling efficient GPU meshes in DGX systems.
NVLink Generations and Performance Improvements
NVIDIA has released four generations of NVLink, each doubling (or nearly doubling) bandwidth.

NVLink 1.0 (2014, Pascal P100)
- Implemented on Pascal GPUs (P100).
- 4 links per GPU, each with 8 lanes.
- Lane speed: 20 Gbps.
- Total bidirectional bandwidth: 160 GB/s, ~5× faster than PCIe 3.0 x16.
- Supported DGX-1, with up to 8 GPUs in a cube mesh topology (not fully connected).
NVLink 2.0 (2017, Volta V100)
- Released with Volta GPUs (V100).
- 6 links per GPU, lane speed 25 Gbps.
- Total bidirectional bandwidth: 300 GB/s.
- Introduction of NVSwitch 1.0 (18 ports, 50 GB/s per port).
- Enabled DGX-2, with 16 fully interconnected GPUs.
NVLink 3.0 (2020, Ampere A100)
- Introduced with Ampere GPUs (A100).
- 12 links per GPU, each link with 4 lanes at 50 Gbps.
- Total bidirectional bandwidth: 600 GB/s.
- NVSwitch upgraded to 36 ports.
- Powered DGX A100, connecting 8 GPUs with 4 NVSwitches.
NVLink 4.0 (2022, Hopper H100)
- Implemented on Hopper GPUs (H100).
- 18 links per GPU, 2 lanes each, at 100 Gbps PAM4.
- Total bidirectional bandwidth: 900 GB/s.
- NVSwitch 3.0 with 64 ports.
- DGX H100 architecture: 8 GPUs, 4 NVSwitches, plus integration with 800G optical modules for non-blocking fabric.
NVLink vs PCIe - A Detailed Comparison
From generation to generation, NVLink consistently outpaces PCIe in both per-lane throughput and aggregate bandwidth.
Bandwidth Per Generation
Generation | NVLink (GB/s bidirectional) | PCIe x16 (GB/s) |
NVLink 1.0 (2014) | 160 | PCIe 3.0: 15.8 |
NVLink 2.0 (2017) | 300 | PCIe 4.0: 31.5 |
NVLink 3.0 (2020) | 600 | PCIe 5.0: 63.0 |
NVLink 4.0 (2022) | 900 | PCIe 6.0: 121 |
Latency and Scalability
- PCIe relies on switch hierarchies and CPU scheduling → higher latency.
- NVLink provides direct GPU-to-GPU lanes and, with NVSwitch, achieves full connectivity for 8–16 GPUs without CPU involvement.
Use Cases Where NVLink Outperforms PCIe
- AI training: Faster gradient synchronization in large LLMs.
- HPC: Tight GPU coupling in scientific workloads.
- Simulation & rendering: Higher throughput in multi-GPU rendering engines.
The Role of NVSwitch in Scaling NVLink
How NVSwitch Works?
NVSwitch acts as a switching fabric for GPUs, ensuring every GPU in a system can communicate with every other GPU at full NVLink speed.
- NVSwitch 1.0: 18 ports, 50 GB/s each.
- NVSwitch 2.0: 36 ports, 50 GB/s each.
- NVSwitch 3.0: 64 ports, supporting 800G optical interconnects.
DGX Systems Overview
System | GPU Model | GPUs per System | NVSwitch Version | Bandwidth per GPU |
DGX-1 | P100 | 8 | None | 160 GB/s |
DGX-2 | V100 | 16 | NVSwitch 1.0 | 300 GB/s |
DGX A100 | A100 | 8 | NVSwitch 2.0 | 600 GB/s |
DGX H100 | H100 | 8 | NVSwitch 3.0 | 900 GB/s |

Future of NVLink and Emerging Technologies
Optical NVLink (Silicon Photonics)
NVIDIA has been exploring optical interconnects:
- Embedding silicon photonics next to GPUs.
- Connecting GPUs via optical fibers for long-distance, high-bandwidth scaling.
- Potential for AI superclusters beyond 256 GPUs.
Integration with InfiniBand and SHARP
Since acquiring Mellanox, NVIDIA is combining NVLink + InfiniBand technologies:
- External NVSwitch chips with SHARP (Scalable Hierarchical Aggregation and Reduction Protocol).
- Enables network-level GPU collectives, reducing bottlenecks in AI and HPC clusters.
Practical Applications of NVLink
- AI Training at Scale: LLMs like GPT, BERT, and diffusion models require thousands of GPUs. NVLink minimizes communication overhead.
- High-Performance Computing: Weather prediction, molecular dynamics, and quantum simulations benefit from lower latency inter-GPU transfers.
- Cloud and Data Centers: Multi-tenant AI workloads rely on NVSwitch-based fabrics for GPU virtualization.
- Financial Services: Faster GPU analytics in real-time trading systems.
Frequently Asked Questions (FAQ)
Q1: What is NVLink and how does it work?
A: NVLink is a point-to-point interconnect that links GPUs (and CPUs) with much higher bandwidth and lower latency than PCIe.
Q2: How is NVLink different from PCIe?
A: PCIe is a general-purpose bus with lower bandwidth and higher latency. NVLink is specialized for GPU scaling, offering up to 900 GB/s bandwidth in NVLink 4.0.
Q3: Which NVIDIA GPUs support NVLink?
A: Pascal (P100), Volta (V100), Ampere (A100), and Hopper (H100) all feature NVLink support.
Q4: What is NVSwitch and why is it important?
A: NVSwitch is a switch fabric that allows every GPU in a system to be fully connected at NVLink speeds, enabling scalable DGX systems.
Q5: What is the future of NVLink in AI computing?
A: Future NVLink generations may incorporate optical interconnects, supporting massive AI clusters with tens of thousands of GPUs.
Conclusion
Over four generations, NVIDIA NVLink has redefined GPU interconnects, consistently outpacing PCIe in both bandwidth and scalability. NVSwitch has enabled fully connected GPU meshes in DGX systems, while future advancements in optical NVLink and InfiniBand integration may extend scalability to entire AI superclusters.
For enterprises building AI and HPC infrastructure, NVLink is not just an NVIDIA innovation—it is the backbone of modern GPU computing.
Did this article help you or not? Tell us on Facebook and LinkedIn . We’d love to hear from you!