Triton-based 3D Sparse Convolution

A high-performance, hardware-agnostic 3D Sparse Convolution library implemented purely in Triton. This project aims to provide a seamless torch.nn compatible interface that runs on any device supported by the Triton compiler (NVIDIA, AMD, etc.), eliminating the dependency on proprietary CUDA/C++ extensions.

🌟 Key Objectives

Vendor Agnostic: Zero CUDA C++ code. Fully compatible with NVIDIA and AMD (ROCm) via Triton.
Memory Efficiency: Utilizes sparse data structures to handle large-scale 3D point clouds or medical volumes.
Performance: Highly optimized kernels for Gather-GEMM-Scatter operations.

🚧 Work in Progress

This project is currently under active development. While core functionalities are implemented, some features may be unstable, undocumented, or subject to change. Performance optimizations and comprehensive testing are ongoing. Your contributions and feedback are highly welcome!

🚀 Features

Submanifold Sparse Convolution: Preserves input sparsity patterns for deep architectures.
Standard Sparse Convolution: Supports stride and padding for downsampling.
GPU-based Rule Generation: Fast index mapping and rulebook generation using Triton-based hashing.
Autograd Support: Full backward pass implementation for end-to-end training.

🛠 Architecture

Coordinate Hashing: Map 3D coordinates to linear indices using a parallel hash table.
Rulebook Generation: Identify active neighbor pairs for each kernel offset.
Gather-GEMM-Scatter:
- Gather: Collect features based on the rulebook.
- GEMM: Perform matrix multiplication using Triton's fused kernels.
- Scatter: Distribute results back to the output sparse tensor.

📋 TODO

Known Issues

Backward pass weight gradient incorrect (99.5% mismatch) - DEBUG_SUMMARY.md
Transposed convolution forward fails for stride=2 cases
Pooling operations fail on CPU (atomic operations not supported)

CPU Support

CPU implementation or fallback (many tests skipped on CPU)

Performance Optimization

Autotuner config tuning (C_in=1, 64 show issues)
Memory efficiency for large-scale point clouds

Features

More activation functions
Batch normalization for CPU
Documentation & API reference

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
sparsetriton		sparsetriton
tests		tests
.gitignore		.gitignore
DOCSTRING_RULES.md		DOCSTRING_RULES.md
README.md		README.md
benchmark_spconv_vs_sparsetriton.py		benchmark_spconv_vs_sparsetriton.py
pyproject.toml		pyproject.toml
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton-based 3D Sparse Convolution

🌟 Key Objectives

🚧 Work in Progress

🚀 Features

🛠 Architecture

📋 TODO

Known Issues

CPU Support

Performance Optimization

Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Triton-based 3D Sparse Convolution

🌟 Key Objectives

🚧 Work in Progress

🚀 Features

🛠 Architecture

📋 TODO

Known Issues

CPU Support

Performance Optimization

Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages