PAROAttention (Hardware code)

This package includes PAROAttention codes with quantization and sparsity implementation. We build PAROAttention based on the SageAttention code, and further integrate the tailored designs for PAROAttention block-wise sparsity.

Installation

python >= 3.9, torch >= 2.3.0, CUDA >= 11.8
flash-attn for benchmarking
compile from source python setup.py develop or pip install -e .
Note that this version is only verified on Ampere (sm80) GPUs, such as A100, A800 etc.

Operator-level Benchmarking

Attention acceleration under varying sparsity (0.2, 0.3, 0.5 density)
```
cd bench
python bench.py
```
Rope kernel with and without permutation
```
cd bench
python overhead.py
```
Benchmark the baseline implementation such as FlashAttention V2
```
cd bench
python bench_baseline.py --method fa2
```
The speed comparison between FlashAttention V2, SageAttention, SpargeAttenion, SparseVideoGen and PAROAttention(ours) and analysis of overhead respectively (we need calib data for profiling as sparge is data dependency)
```
cd bench
python bench_all.py --q_path your/path/to/q --k_path your/path/to/k --v_path your/path/to/v --permute_plan_path your/path/to/permute_plan
```

End-to-end Demo

CogVideoX-5b

We provide the example of PAROAttention cogvideox pipeline in ./example/pipeline_cogvideox.py, adopting the PARO_CogVideoXAttnProcessor in paroattention/cogvideox.py.

You need to specify the permute_plan.pth and sparse_plan.pth path.

For comparison with FA2 to measure speedup, you could uncomment the F.scaled_dot_product_attention in the code to adopt the FA2 implementation, and compare the Attention Time of two approaches.

License

This project is licensed under the Apache License 2.0.
See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
bench		bench
csrc		csrc
example		example
paroattention		paroattention
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAROAttention (Hardware code)

Installation

Operator-level Benchmarking

End-to-end Demo

CogVideoX-5b

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PAROAttention (Hardware code)

Installation

Operator-level Benchmarking

End-to-end Demo

CogVideoX-5b

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages