ICASSP 2026 Oral
Neural speech coding has recently achieved remarkable progress at low and ultra-low bitrates, but its efficiency is still limited by the ability to learn compact representations. To address this challenge, we introduce Lisa, a lightweight neural speech codec that improves both feature representation and quantization.
Lisa uses a causal frequency-domain encoder-decoder with Inception Residual Blocks (IRB) to better capture multi-scale correlations. It also introduces Regulated Residual Vector Quantization (R-RVQ), which modulates residuals into quantization-friendly forms for more compact multi-stage representation. Experiments show that Lisa achieves stronger coding efficiency than existing neural speech codecs while keeping low complexity for real-time speech communication and streaming.
Create the environment and install dependencies:
conda create -n lisa python=3.8 -y
conda activate lisa
pip install -r requirements.txtBefore running the code, update the hard-coded project path in runs/lisa/train.py, runs/lisa/test.py, and runs/lisa/model.py. Replace /root/hjk/lisa_release_root/ with the absolute path of this repository on your machine, for example:
/path/to/lisa-main/Download the LibriTTS dataset and prepare the following subsets:
train-clean-100andtrain-clean-360for trainingtest-cleanfor evaluation
Resample all audio files to 16 kHz before training or evaluation.
The source code does not include pretrained model files. Download the released checkpoints from the NJU cloud drive.
Put the downloaded checkpoint folders under saves/.
Run inference with a pretrained model:
python runs/lisa/test.py \
--root_dir /path/to/test_audio \
--bandwidth 1500 \
--pretrain saves/lisa_1500/ckpt/iter_1200000.model \
--test_from forward \
--device cuda:0Reconstructed audio will be saved under:
saves/lisa/output/
To run objective metric evaluation, use:
--test_from modelThe evaluation code computes metrics including ViSQOL, STOI, and PESQ.
python runs/lisa/train.py \
--root_dir /path/to/train_audio \
--bandwidth 1500 \
--batch_size 16 \
--learning_rate 1e-4python runs/lisa/train.py \
--root_dir /path/to/train_audio \
--bandwidth 1500 \
--pretrain saves/lisa_1500/ckpt/iter_1200000.modelIf you find this project useful, please cite:
@inproceedings{huang2026lisa,
title={Lisa: Lightweight Yet Superb Neural Speech Coding},
author={Huang, Jiankai and Zhang, Junteng and Lu, Ming and Cao, Xun and Ma, Zhan},
booktitle={ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={14457--14461},
year={2026},
organization={IEEE}
}These files are provided by Nanjing University Vision Lab. Please contact us (jiankaihuang@smail.nju.edu.cn and zhangjunteng@smail.nju.edu.cn) if you have any questions.