Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.09 KB

File metadata and controls

32 lines (23 loc) · 1.09 KB

Blackwell Microbenchmarks

A collection of microbenchmarks for NVIDIA Blackwell (SM 100) GPUs, covering memory throughput, latency, tensor core (UMMA) performance, and HBM-resident elementwise throughput.

https://newsletter.semianalysis.com/p/dissecting-nvidia-blackwell-tensor

Benchmarks

Path Purpose
ldgsts_throughput/ LDGSTS HBM throughput
tma2d_throughput/ TMA 2D HBM throughput
ldgsts_latency/ LDGSTS latency
tma2d_latency/ TMA 2D latency
umma_throughput/ UMMA tensor-core throughput
umma_latency/ UMMA tensor-core latency
elementwise_throughput/ fp32 HBM-resident activation/elementwise throughput
image

Acknowledgements

Compute for this project is generously sponsored by Nebius and Verda.

Nebius        Verda