Feather GEMM: Toward cuBLAS performance

Introduction

This is a work log and self-study journey on how to optimize single precision general matrix multiplication on RTX4060 GPU. I explained some basic cuda concepts and profiler tricks during kernel optimization. Currently I am not very satisfied with the results, although I did put a lot of effort.

Build

mkdir build
cd build
cmake ..
cmake --build . --config Release

Thanks

How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog

Beating cuBLAS in Single-Precision General Matrix Multiplication

nvidia cuda c++ programming guide

nvidia ncu document

professional cuda c programming

triton document

efficient gemm

gpu mode

TODO

kernel 8 performs worse than kernel 7, I wonder why.
kernel 9 double buffering now is right, but I am not satisfied with the performance.
For now, my kernels only deal with perfect square matrix with no tile quantization.
Maybe I will integrate PTX code into cuda code with asm().
warp level matmul, tensor core
Hopper features: TMA, Asynchrony, check Hopper for details.

TIPS

chcp.com 65001 to avoid garbled in windows cmd.
ncu is not happy with cloud gpu.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
docs		docs
include		include
scripts		scripts
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
cublas_sgemm.cu		cublas_sgemm.cu
dummy_kernel.cu		dummy_kernel.cu
logo.png		logo.png
main.cu		main.cu
triton_matmul.py		triton_matmul.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feather GEMM: Toward cuBLAS performance

Introduction

Build

Thanks

TODO

TIPS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Feather GEMM: Toward cuBLAS performance

Introduction

Build

Thanks

TODO

TIPS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages