Fast-FTIC

This is a fast implementation of Frequency aware Transformer for Learned Image Compression (ICLR 2024). The original implementation is based on range-coder (slower than rANS) and inefficient per-pixel cdf computation. This implementation uses rANS and batch cdf computation for faster evaluation.

Exactness

You can load original weights for this implementation. The speed is much faster (less than 1s/img) than the original one while performance is a bit lower (approx. +0.001 bpp) due to the entropy coder differences.

The RD curve is below:

RD Data

Checkpoint	Forward (bpp / PSNR)	Compress (official RD) (bpp / PSNR)	Compress (This repo) (bpp / PSNR)
ckpt_0018.pth	0.131 / 29.641	0.1294 / 29.640	0.132 / 29.641
ckpt_0035.pth	0.198 / 31.143	0.2003 / 31.132	0.200 / 31.143
ckpt_0067.pth	0.298 / 32.690	0.2993 / 32.702	0.300 / 32.690
ckpt_0130.pth	0.437 / 34.413	0.4372 / 34.420	0.439 / 34.413
ckpt_0250.pth	0.614 / 36.156	0.6158 / 36.170	0.616 / 36.156
ckpt_0483.pth	0.839 / 37.904	0.8420 / 37.918	0.843 / 37.904

Runtime

Runtime is evaluated with single RTX 5060 Ti.

Checkpoint	Forward avg_time_s	Compress (This repo) avg_time_s
ckpt_0018.pth	0.237	0.655
ckpt_0035.pth	0.176	0.606
ckpt_0067.pth	0.219	0.612
ckpt_0130.pth	0.217	0.759
ckpt_0250.pth	0.22	0.806
ckpt_0483.pth	0.211	1.072

Installation

This implementation requires a custom rANS written in C++. Please follow the instructions below to compile it.

git clone https://github.com/tokkiwa/Fast-FTIC
cd Fast-FTIC
python3 -m pip install -e ./cpp --no-build-isolation

The model models/flic.py has newly added compress_fast and decompress_fast functions that utilize the custom rANS for faster compression and decompression. The other parts are mostly unchanged.

Run the evaluation as follows:

python3 eval.py \
  --checkpoint [path of the pretrained checkpoint] \
  --data [path of testing dataset] --cuda --real --fast

The below are original README from FTIC.

ICLR2024-Frequency-aware-Transformer-for-Learned-Image-Compression

News

🔥 Our paper AuxT: On Disentangled Training for Nonlinear Transform in Learned Image Compression has been accepted as a Spotlight at ICLR 2025

Introduction

This repository is the offical Pytorch implementation of FTIC: Frequency-aware Transformer for Learned Image Compression (ICLR2024).

Abstract: Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets.

Architectures

The overall framework of FLIC.

RD Results

RD curves on Kodak.

Dependencies

python==3.8.17
PyTorch==1.12.1
torchvision==0.16.1
compressai==1.2.4
range-coder==1.1
einops
timm

Training

CUDA_VISIBLE_DEVICES='0' python -u ./train.py -d [path of training dataset] \
    --cuda  --lambda 0.0483 --epochs 50  \
    --save_path [path for checkpoint] --save \
    --checkpoint [path of the pretrained checkpoint]

Testing

python eval.py --checkpoint [path of the pretrained checkpoint] --data [path of testing dataset] --cuda

Pretrained Model

Lambda	Metric	Link
0.0483	MSE	ckpt_mse_0483.pth
0.0250	MSE	ckpt_mse_0250.pth
0.0130	MSE	ckpt_mse_0130.pth
0.0067	MSE	ckpt_mse_0067.pth
0.0035	MSE	ckpt_mse_0035.pth
0.0018	MSE	ckpt_mse_0018.pth
60.50	MS-SSIM	ckpt_msssim_6050.pth
31.73	MS-SSIM	ckpt_msssim_3173.pth
16.64	MS-SSIM	ckpt_msssim_1664.pth
8.73	MS-SSIM	ckpt_msssim_0873.pth
4.58	MS-SSIM	ckpt_msssim_0458.pth
2.40	MS-SSIM	ckpt_msssim_0240.pth

R-D data

Kodak,PSNR

bpp,PSNR
0.1294	29.640
0.2003	31.132
0.2993	32.702
0.4372	34.420
0.6158	36.170
0.842	37.918

Kodak,MS-SSIM

bpp,MS-SSIM
0.1209	13.8585
0.1719	15.4219
0.2407	16.9093
0.3262	18.4375
0.4443	20.0413
0.6089	21.6489

CLIC,PSNR

bpp,PSNR
0.105	31.38
0.155	32.83
0.225	34.23
0.322	35.69
0.451	37.15
0.627	38.64

Tecnick ,PSNR

bpp,PSNR
0.115	31.64
0.161	33.10
0.222	34.49
0.307	35.91
0.42	37.30
0.574	38.68

Acknowledgement

Part of our code is borrowed from the following repositories.

TCM-LIC
STF

Citation

@inproceedings{
li2024frequencyaware,
title={Frequency-Aware Transformer for Learned  Image Compression},
author={Han Li and Shaohui Li and Wenrui Dai and Chenglin Li and Junni Zou and Hongkai Xiong},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=HKGQDDTuvZ}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
cpp		cpp
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
eval.py		eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast-FTIC

Exactness

RD Data

Runtime

Installation

ICLR2024-Frequency-aware-Transformer-for-Learned-Image-Compression

News

Introduction

Architectures

RD Results

Dependencies

Training

Testing

Pretrained Model

R-D data

Kodak,PSNR

Kodak,MS-SSIM

CLIC,PSNR

Tecnick ,PSNR

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fast-FTIC

Exactness

RD Data

Runtime

Installation

ICLR2024-Frequency-aware-Transformer-for-Learned-Image-Compression

News

Introduction

Architectures

RD Results

Dependencies

Training

Testing

Pretrained Model

R-D data

Kodak,PSNR

Kodak,MS-SSIM

CLIC,PSNR

Tecnick ,PSNR

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages