Skip to content

NJUVISION/Lisa

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lisa: Lightweight Yet Superb Neural Speech Coding

ICASSP 2026 Oral

Paper Demo

📖 Introduction

Neural speech coding has recently achieved remarkable progress at low and ultra-low bitrates, but its efficiency is still limited by the ability to learn compact representations. To address this challenge, we introduce Lisa, a lightweight neural speech codec that improves both feature representation and quantization.

Lisa uses a causal frequency-domain encoder-decoder with Inception Residual Blocks (IRB) to better capture multi-scale correlations. It also introduces Regulated Residual Vector Quantization (R-RVQ), which modulates residuals into quantization-friendly forms for more compact multi-stage representation. Experiments show that Lisa achieves stronger coding efficiency than existing neural speech codecs while keeping low complexity for real-time speech communication and streaming.

🔧 Installation

Create the environment and install dependencies:

conda create -n lisa python=3.8 -y
conda activate lisa
pip install -r requirements.txt

Before running the code, update the hard-coded project path in runs/lisa/train.py, runs/lisa/test.py, and runs/lisa/model.py. Replace /root/hjk/lisa_release_root/ with the absolute path of this repository on your machine, for example:

/path/to/lisa-main/

📂 Datasets

Download the LibriTTS dataset and prepare the following subsets:

  • train-clean-100 and train-clean-360 for training
  • test-clean for evaluation

Resample all audio files to 16 kHz before training or evaluation.

📦 Pretrained Models

The source code does not include pretrained model files. Download the released checkpoints from the NJU cloud drive.

Put the downloaded checkpoint folders under saves/.

💻 Inference

Run inference with a pretrained model:

python runs/lisa/test.py \
  --root_dir /path/to/test_audio \
  --bandwidth 1500 \
  --pretrain saves/lisa_1500/ckpt/iter_1200000.model \
  --test_from forward \
  --device cuda:0

Reconstructed audio will be saved under:

saves/lisa/output/

To run objective metric evaluation, use:

--test_from model

The evaluation code computes metrics including ViSQOL, STOI, and PESQ.

🚀 Training

Option A: train from scratch

python runs/lisa/train.py \
  --root_dir /path/to/train_audio \
  --bandwidth 1500 \
  --batch_size 16 \
  --learning_rate 1e-4

Option B: load a pretrained model

python runs/lisa/train.py \
  --root_dir /path/to/train_audio \
  --bandwidth 1500 \
  --pretrain saves/lisa_1500/ckpt/iter_1200000.model

📝 BibTeX

If you find this project useful, please cite:

@inproceedings{huang2026lisa,
  title={Lisa: Lightweight Yet Superb Neural Speech Coding},
  author={Huang, Jiankai and Zhang, Junteng and Lu, Ming and Cao, Xun and Ma, Zhan},
  booktitle={ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={14457--14461},
  year={2026},
  organization={IEEE}
}

👥 Authors

These files are provided by Nanjing University Vision Lab. Please contact us (jiankaihuang@smail.nju.edu.cn and zhangjunteng@smail.nju.edu.cn) if you have any questions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%