What is NVSwitch? NVIDIA GPU Switch Fabric Technology

May 14, 2026 · Technical Deep Dives
Reviewed by NTS AI Infrastructure Engineer · Technical accuracy verified for enterprise & federal deployment
NVIDIA Ampere A100 80 GB PCIe 4.0 Graphic Card – Dual Slot Passive Cooling
NVIDIA Ampere A100 80 GB PCIe 4.0 Graphic Card – Dual Slot Passive Cooling — click to enlarge

Quick Summary

  • Definition: NVIDIA switch fabric for connecting multiple GPUs
  • NVSwitch v3: 900 GB/s per link, 64 ports in DGX H100
  • Topology: All-to-all non-blocking mesh for maximum bandwidth
  • Scaling: Enables 256 GPU single NVLink domain
  • Impact: Essential for tensor parallelism in models >70B parameters

What is NVSwitch? NVIDIA GPU Switch Fabric Technology

NVSwitch is NVIDIA's high-bandwidth switch fabric technology HGX B200 with NVSwitch that enables full GPU-to-GPU communication at NVLink speeds across multiple servers. While NVLink connects GPUs within a single server, NVSwitch extends that connectivity across an entire GPU cluster, creating a single NVLink domain of up to 256 GPUs connected at 900 GB/s each.

How NVSwitch Works

NVSwitch is a physical switch ASIC with 64 NVLink ports, each delivering 900 GB/s of bidirectional bandwidth. In the NVIDIA DGX H100 system, four NVSwitch chips create a fully connected 3-level fat-tree topology connecting 8 H100 GPUs with all-to-all bandwidth. In larger configurations, NVSwitch systems connect multiple DGX pods into clusters of 256 GPUs with uniform high-bandwidth connectivity.

NVSwitch Performance Impact

For tensor parallelism—where layers are split across GPUs and every matrix multiplication requires all-to-all communication—NVSwitch eliminates the communication bottleneck that limits scaling with InfiniBand or Ethernet. Training throughput scales nearly linearly up to 256 GPUs within a single NVLink domain, compared to the sub-linear scaling typical of InfiniBand-connected clusters beyond 64 GPUs.

NVSwitch for Enterprise AI

NVSwitch configurations are essential for training models exceeding 100B parameters where tensor parallelism across 8-32 GPUs is required. The uniform, high-bandwidth connectivity simplifies model parallelism strategies and reduces the engineering effort required for distributed training. NTS provides NVSwitch-enabled configurations for enterprise and government AI deployments.

Related Content

Explore more about this topic:

Frequently Asked Questions

How does NVSwitch differ from InfiniBand?

NVSwitch operates at GPU memory bus speeds (900 GB/s) versus InfiniBand's network speeds (50 GB/s for NDR400). NVSwitch is limited to GPU-GPU communication within a single cluster pod. InfiniBand connects between pods and to storage systems.

Do I need NVSwitch for AI training?

NVSwitch is essential for tensor parallelism with models >70B parameters. For smaller models using data parallelism only, NVSwitch provides minimal benefit. NTS recommends NVSwitch configurations for flagship training infrastructure and InfiniBand-only for inference and smaller-scale training.