GPU Memory Bandwidth: Complete Guide to HBM, GDDR, and LPDDR

May 14, 2026 · Technical Deep Dives

Reviewed by NTS AI Infrastructure Engineer · Technical accuracy verified for enterprise & federal deployment

NTS Elite APEX 4U Dual EPYC 8-GPU AI Server — click to enlarge

Quick Summary

HBM3: 3.35 TB/s (H100), 8 TB/s (B200), stack-based, high bandwidth
GDDR6: 768 GB/s (L40S), high capacity, cost-effective
GDDR7: 1.5 TB/s expected, next-gen graphics memory
Bandwidth Impact: Directly determines LLM training tokens/second
Guideline: 1 GB/s bandwidth per 1B model parameters for efficient training

GPU Memory Bandwidth NVIDIA H200 NVL: Complete Technical Guide

GPU memory bandwidth—the rate at which data can be read from or written to GPU memory—is the single most important performance determinant for AI training workloads. Unlike CPU workloads that benefit from complex caching hierarchies, AI training is dominated by streaming memory access patterns that directly depend on raw memory bandwidth. Understanding memory bandwidth characteristics is essential for GPU selection, workload optimization, and infrastructure planning.

Memory Bandwidth Comparison by GPU

GPU	Memory Type	Bandwidth	Capacity	Bandwidth per Watt
NVIDIA A100	HBM2e	2.0 TB/s	40/80 GB	2.5 GB/s/W
NVIDIA H100 SXM	HBM3	3.35 TB/s	80 GB	4.8 GB/s/W
NVIDIA H200 NVL	HBM3e	4.8 TB/s	141 GB	6.9 GB/s/W
NVIDIA B200	HBM3e	8.0 TB/s	192 GB	8.0 GB/s/W
AMD MI300X	HBM3	5.3 TB/s	192 GB	7.1 GB/s/W
NVIDIA L40S	GDDR6	864 GB/s	48 GB	2.5 GB/s/W
NVIDIA L4	GDDR6	300 GB/s	24 GB	4.2 GB/s/W

Bandwidth Impact on Training Performance

LLM training throughput scales nearly linearly with memory bandwidth up to the point where compute capacity becomes the bottleneck. For transformer models with moderate batch sizes, each 1% increase in memory bandwidth translates to approximately 0.8-1% increase in training throughput. The relationship is strongest for memory-bound workloads like long-sequence transformers and attention-heavy architectures.

Practical Guidelines

For AI infrastructure planning, a practical rule of thumb is that each 1 billion model parameters requires approximately 1 GB/s of memory bandwidth for efficient training. A 70B parameter model benefits from 70 GB/s aggregate bandwidth across the GPU configuration. Eight H100 GPUs providing 26.8 TB/s aggregate bandwidth (3.35 TB/s × 8) provides substantial headroom for 70B model training.

GPU Memory Bandwidth: Complete Guide to HBM, GDDR, and LPDDR

Quick Summary

GPU Memory Bandwidth NVIDIA H200 NVL: Complete Technical Guide

Memory Bandwidth Comparison by GPU

Bandwidth Impact on Training Performance

Practical Guidelines

Related Content

Does memory bandwidth affect inference as much as training?

How does memory bandwidth scale across multiple GPUs?

Ready to Build Your AI Infrastructure?