Skip to content

jamescsq47/PAROAtten

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

128 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAROAttention (Hardware code)

This package includes PAROAttention codes with quantization and sparsity implementation. We build PAROAttention based on the SageAttention code, and further integrate the tailored designs for PAROAttention block-wise sparsity.

Installation

  • python >= 3.9, torch >= 2.3.0, CUDA >= 11.8
  • flash-attn for benchmarking
  • compile from source python setup.py develop or pip install -e .
  • Note that this version is only verified on Ampere (sm80) GPUs, such as A100, A800 etc.

Operator-level Benchmarking

  • Attention acceleration under varying sparsity (0.2, 0.3, 0.5 density)

    cd bench
    python bench.py
    
  • Rope kernel with and without permutation

    cd bench
    python overhead.py
    
  • Benchmark the baseline implementation such as FlashAttention V2

    cd bench
    python bench_baseline.py --method fa2
    
  • The speed comparison between FlashAttention V2, SageAttention, SpargeAttenion, SparseVideoGen and PAROAttention(ours) and analysis of overhead respectively (we need calib data for profiling as sparge is data dependency)

    cd bench
    python bench_all.py --q_path your/path/to/q --k_path your/path/to/k --v_path your/path/to/v --permute_plan_path your/path/to/permute_plan
    

End-to-end Demo

CogVideoX-5b

We provide the example of PAROAttention cogvideox pipeline in ./example/pipeline_cogvideox.py, adopting the PARO_CogVideoXAttnProcessor in paroattention/cogvideox.py.

You need to specify the permute_plan.pth and sparse_plan.pth path.

For comparison with FA2 to measure speedup, you could uncomment the F.scaled_dot_product_attention in the code to adopt the FA2 implementation, and compare the Attention Time of two approaches.

License

This project is licensed under the Apache License 2.0.
See the LICENSE file for more information.

About

PAROAttn based on SAGEAttn

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors