The Evolution of NVIDIA NVLink: NVSwitch and PCIe Comparison in 2025

For more than a decade, NVIDIA has been pushing the limits of GPU-to-GPU communication. As deep learning, high-performance computing (HPC), and large-scale simulations demand ever-greater parallelism, the bandwidth bottleneck of PCIe became a critical roadblock.

To address this, NVIDIA introduced NVLink, a proprietary high-speed interconnect designed to deliver low latency, high bandwidth, and efficient GPU scaling.

This article explores the evolution of NVLink from its first release in 2014 to its fourth generation, compares it with PCIe, explains the role of NVSwitch, and analyzes its role in AI, HPC, and the future of optical interconnects.

What is NVLink and Why it Matters?

The Bandwidth Bottleneck of PCIe

Peripheral Component Interconnect Express (PCIe) has been the standard bus connecting GPUs and CPUs for decades. While PCIe has advanced from Gen 1 (2.5 Gbps per lane) to Gen 6 (64 Gbps per lane, PAM4), its scaling could not keep pace with the exponential growth of GPU performance.

For instance, PCIe 3.0 x16 delivers ~16 GB/s bandwidth, but modern GPUs can consume data at hundreds of GB/s. This mismatch created a bottleneck in multi-GPU systems, where inter-GPU communication became a limiting factor.

Introduction to NVLink

NVLink is NVIDIA’s proprietary point-to-point interconnect that bypasses PCIe switches and CPU scheduling, allowing GPUs to communicate directly with bidirectional bandwidths up to 900 GB/s (NVLink 4.0).

Key design goals of NVLink:

High bandwidth: Multiplying effective GPU interconnect capacity.
Low latency: Shorter data paths compared to PCIe switching.
Scalability: Enabling efficient GPU meshes in DGX systems.

NVLink Generations and Performance Improvements

NVIDIA has released four generations of NVLink, each doubling (or nearly doubling) bandwidth.

NVLink 1.0 (2014, Pascal P100)

Implemented on Pascal GPUs (P100).
4 links per GPU, each with 8 lanes.
Lane speed: 20 Gbps.
Total bidirectional bandwidth: 160 GB/s, ~5× faster than PCIe 3.0 x16.
Supported DGX-1, with up to 8 GPUs in a cube mesh topology (not fully connected).

NVLink 2.0 (2017, Volta V100)

Released with Volta GPUs (V100).
6 links per GPU, lane speed 25 Gbps.
Total bidirectional bandwidth: 300 GB/s.
Introduction of NVSwitch 1.0 (18 ports, 50 GB/s per port).
Enabled DGX-2, with 16 fully interconnected GPUs.

NVLink 3.0 (2020, Ampere A100)

Introduced with Ampere GPUs (A100).
12 links per GPU, each link with 4 lanes at 50 Gbps.
Total bidirectional bandwidth: 600 GB/s.
NVSwitch upgraded to 36 ports.
Powered DGX A100, connecting 8 GPUs with 4 NVSwitches.

NVLink 4.0 (2022, Hopper H100)

Implemented on Hopper GPUs (H100).
18 links per GPU, 2 lanes each, at 100 Gbps PAM4.
Total bidirectional bandwidth: 900 GB/s.
NVSwitch 3.0 with 64 ports.
DGX H100 architecture: 8 GPUs, 4 NVSwitches, plus integration with 800G optical modules for non-blocking fabric.

NVLink vs PCIe - A Detailed Comparison

From generation to generation, NVLink consistently outpaces PCIe in both per-lane throughput and aggregate bandwidth.

Bandwidth Per Generation

Generation	NVLink (GB/s bidirectional)	PCIe x16 (GB/s)
NVLink 1.0 (2014)	160	PCIe 3.0: 15.8
NVLink 2.0 (2017)	300	PCIe 4.0: 31.5
NVLink 3.0 (2020)	600	PCIe 5.0: 63.0
NVLink 4.0 (2022)	900	PCIe 6.0: 121

Latency and Scalability

PCIe relies on switch hierarchies and CPU scheduling → higher latency.
NVLink provides direct GPU-to-GPU lanes and, with NVSwitch, achieves full connectivity for 8–16 GPUs without CPU involvement.

Use Cases Where NVLink Outperforms PCIe

AI training: Faster gradient synchronization in large LLMs.
HPC: Tight GPU coupling in scientific workloads.
Simulation & rendering: Higher throughput in multi-GPU rendering engines.

The Role of NVSwitch in Scaling NVLink

How NVSwitch Works?

NVSwitch acts as a switching fabric for GPUs, ensuring every GPU in a system can communicate with every other GPU at full NVLink speed.

NVSwitch 1.0: 18 ports, 50 GB/s each.
NVSwitch 2.0: 36 ports, 50 GB/s each.
NVSwitch 3.0: 64 ports, supporting 800G optical interconnects.

DGX Systems Overview

System	GPU Model	GPUs per System	NVSwitch Version	Bandwidth per GPU
DGX-1	P100	8	None	160 GB/s
DGX-2	V100	16	NVSwitch 1.0	300 GB/s
DGX A100	A100	8	NVSwitch 2.0	600 GB/s
DGX H100	H100	8	NVSwitch 3.0	900 GB/s

Future of NVLink and Emerging Technologies

Optical NVLink (Silicon Photonics)

NVIDIA has been exploring optical interconnects:

Embedding silicon photonics next to GPUs.
Connecting GPUs via optical fibers for long-distance, high-bandwidth scaling.
Potential for AI superclusters beyond 256 GPUs.

Integration with InfiniBand and SHARP

Since acquiring Mellanox, NVIDIA is combining NVLink + InfiniBand technologies:

External NVSwitch chips with SHARP (Scalable Hierarchical Aggregation and Reduction Protocol).
Enables network-level GPU collectives, reducing bottlenecks in AI and HPC clusters.

Practical Applications of NVLink

AI Training at Scale: LLMs like GPT, BERT, and diffusion models require thousands of GPUs. NVLink minimizes communication overhead.
High-Performance Computing: Weather prediction, molecular dynamics, and quantum simulations benefit from lower latency inter-GPU transfers.
Cloud and Data Centers: Multi-tenant AI workloads rely on NVSwitch-based fabrics for GPU virtualization.
Financial Services: Faster GPU analytics in real-time trading systems.

Frequently Asked Questions (FAQ)

Q1: What is NVLink and how does it work?
A: NVLink is a point-to-point interconnect that links GPUs (and CPUs) with much higher bandwidth and lower latency than PCIe.

Q2: How is NVLink different from PCIe?
A: PCIe is a general-purpose bus with lower bandwidth and higher latency. NVLink is specialized for GPU scaling, offering up to 900 GB/s bandwidth in NVLink 4.0.

Q3: Which NVIDIA GPUs support NVLink?
A: Pascal (P100), Volta (V100), Ampere (A100), and Hopper (H100) all feature NVLink support.

Q4: What is NVSwitch and why is it important?
A: NVSwitch is a switch fabric that allows every GPU in a system to be fully connected at NVLink speeds, enabling scalable DGX systems.

Q5: What is the future of NVLink in AI computing?
A: Future NVLink generations may incorporate optical interconnects, supporting massive AI clusters with tens of thousands of GPUs.

Conclusion

Over four generations, NVIDIA NVLink has redefined GPU interconnects, consistently outpacing PCIe in both bandwidth and scalability. NVSwitch has enabled fully connected GPU meshes in DGX systems, while future advancements in optical NVLink and InfiniBand integration may extend scalability to entire AI superclusters.

For enterprises building AI and HPC infrastructure, NVLink is not just an NVIDIA innovation—it is the backbone of modern GPU computing.

Did this article help you or not? Tell us on Facebook and LinkedIn . We’d love to hear from you!

Networking The Easy, Budget-Friendly Way to Power Your Small Office or Home Business(SOHO Network)! April 28, 2025 10:00 AM

Networking Stop Overpaying: Essential Guide to Ethernet Switch Port Types April 29, 2025 3:00 PM

Networking MTP® vs. MPO Cables: What You Need to Know? April 30, 2025 9:00 AM

Networking MB vs GB vs KB vs TB: Understand Digital Storage Units in Simple Terms Clearly Explained May 1, 2025 9:00 AM

Networking MB, GB, KB, TB VS Kbps, Mbps, Gbps, Tbps: Difference Explained. May 2, 2025 10:00 AM

Networking Cat5 vs. Cat6 vs. Cat7 vs. Cat8 Ethernet Cables: Shop Wisely in 2025 May 3, 2025 2:00 PM

{"layoutMobile":2,"layoutDesktop":4,"buttonAlign":"Center","buttonText":"Get Best Quote","productTitle":{"hue":210.89,"saturation":1,"brightness":0.9294,"alpha":1},"priceColor":{"hue":0,"saturation":1,"brightness":0},"discountPrice":{"hue":356,"saturation":0.74,"brightness":1},"buttonColor":{"hue":210.89,"saturation":1,"brightness":0.9294,"alpha":1},"textBottomColor":{"hue":0,"saturation":0,"brightness":1},"activeTitleButton":true,"moreProduct":"View all","moreProductUrl":"https://network-switch.com/collections/800g-osfp-optical-transceiver","rowNumber":1,"dynamicProductId":[{"id":"gid://shopify/Product/9024453345495","title":"Network Switch 800G OSFP to OSFP Direct Attach PVC Copper Cable, up to 3m","currencyCode":"USD","amountMax":"490.0","amountMin":"261.0","price":"261.00","compareAtPrice":null,"imagesUrl":"https://cdn.shopify.com/s/files/1/0736/7566/9719/files/800g-osfp-dac-cable_1.png?v=1757398775&width=600","urlStore":"/products/800g-osfp-to-osfp-dac-pvc","altImage":""},{"id":"gid://shopify/Product/8981214494935","title":"NS-OSFP-800G-2FR4 800G Base 2 x FR4 OSFP PAM4 CWDM4 1310nm 2km Finned Top DOM Dual Duplex LC SMF Optical Transceiver Module","currencyCode":"USD","amountMax":"1682.0","amountMin":"1682.0","price":"1682.00","compareAtPrice":null,"imagesUrl":"https://cdn.shopify.com/s/files/1/0736/7566/9719/files/NS-OSFP-800G-2FR4_5.jpg?v=1753440514&width=600","urlStore":"/products/ns-osfp-800g-2fr4-fiber-optic-transceiver","altImage":""},{"id":"gid://shopify/Product/8981214560471","title":"NS-OSFP-800G-DR8 800G Base-DR8/2xDR4 OSFP PAM4 1310nm Finned Top MTP/MPO-12 APC 500m SMF Optical Transceiver Module","currencyCode":"USD","amountMax":"1187.0","amountMin":"1187.0","price":"1187.00","compareAtPrice":null,"imagesUrl":"https://cdn.shopify.com/s/files/1/0736/7566/9719/files/NS-OSFP-800G-DR8_1.jpg?v=1753441254&width=600","urlStore":"/products/ns-osfp-800g-dr8-500m-fiber-optic-transceiver","altImage":""},{"id":"gid://shopify/Product/8981214527703","title":"NS-OSFP-800G-SR8 OSFP PAM4 850nm MTP/MPO 50m OM4 MMF FEC Transceiver Module Finned Top","currencyCode":"USD","amountMax":"791.0","amountMin":"791.0","price":"791.00","compareAtPrice":null,"imagesUrl":"https://cdn.shopify.com/s/files/1/0736/7566/9719/files/NS-OSFP-800G-SR8_1.jpg?v=1753440948&width=600","urlStore":"/products/ns-osfp-800g-sr8-fiber-optic-transceiver","altImage":""},{"id":"gid://shopify/Product/8981214593239","title":"NS-OSFP-400G-SR4-FLT 400G Base-SR4 OSFP Flat Top PAM4 850nm 30m on OM3/50m on OM4 MTP/MPO-12 Multimode FEC Optical Transceiver Module","currencyCode":"USD","amountMax":"692.0","amountMin":"692.0","price":"692.00","compareAtPrice":null,"imagesUrl":"https://cdn.shopify.com/s/files/1/0736/7566/9719/files/NS-OSFP-400G-SR4_1.jpg?v=1753510679&width=600","urlStore":"/products/ns-osfp-400g-sr4-fiber-optic-transceiver","altImage":""},{"id":"gid://shopify/Product/8981214658775","title":"NS-OSFP-400G-DR4 400G Base-DR4 OSFP Finned Top PAM4 1310nm MTP/MPO-12 500m SMF FEC Optical Transceiver Module","currencyCode":"USD","amountMax":"1232.0","amountMin":"1232.0","price":"1232.00","compareAtPrice":null,"imagesUrl":"https://cdn.shopify.com/s/files/1/0736/7566/9719/files/NS-OSFP-400G-DR4_1.jpg?v=1753441922&width=600","urlStore":"/products/ns-osfp-400g-dr4-fiber-optic-transceiver","altImage":""}],"typeSearch":"product","device":"desktop","decimalsPrice":2,"aspectRatio":"1 / 1","isAutoPlay":true,"isNavigation":false,"isPriceDisplay":true,"rowDynamic":"mot","activeDecimals":false,"hidden":false,"locked":false,"positionPrice":false,"currencyCodeCustom":"$","cssContent":"","isRemoveBranding":true,"componentId":"undefined","activeDynamicProd":true,"blockName":"Product"}