Skip to content

AutoLab-SAI-SJTU/AAD-1

Repository files navigation

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

Haobo Li1,2 · Yanhong Zeng2,3,✉ · Yunhong Lu4,2 · Jiapeng Zhu2 · Hao Ouyang2 · Qiuyu Wang2 · Ka Leong Cheng2 · Yujun Shen2 · Zhipeng Zhang1,5,✉

1AutoLab, SAI, SJTU 2Ant Group 3Department of Automation, Tsinghua University 4Zhejiang University 5Anyverse Dynamics

Causal | 1-Step | One-Step | Autoregressive | Video World Model

Keywords: causal video generation, 1-step video generation, one-step autoregressive generation, autoregressive video world model, video-world-model rollouts, causal Wan backbone, long-horizon video generation.

We present AAD-1, a causal, 1-step / one-step, autoregressive video world model built with Asymmetric Adversarial Distillation. Given a reference image and a text prompt, AAD-1 generates long-horizon video-world-model rollouts with one sampling step per causal chunk. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench.

AAD-1 training pipeline

AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse.

Progress

  • 📝 Technical Report / Paper
  • 🌐 Project Homepage
  • 💻 Inference Code
  • 🤗 Pretrained Checkpoints

Setup

Clone the repository:

git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
cd AAD-1

Install with uv:

uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn --no-build-isolation
uv pip install -e .

Alternatively, use conda:

conda create -n AAD-1 python=3.10 -y
conda activate AAD-1
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop

Checkpoints

The public release path only needs:

  1. 🤗 Official shared Wan model: Wan2.1-T2V-14B
  2. 🤗 Released AAD-1 sharded generator checkpoint

Download the shared Wan components:

huggingface-cli download \
  Wan-AI/Wan2.1-T2V-14B \
  --local-dir-use-symlinks False \
  --local-dir wan_models/Wan2.1-T2V-14B

If you use a custom shared Wan path, pass it explicitly with --wan_model_dir.

Download the AAD-1 sharded generator checkpoint:

huggingface-cli download \
  Watay/AAD-1 \
  --include "14b_i2v_1step_transformer/*" \
  --local-dir-use-symlinks False \
  --local-dir checkpoints

Optional 2-step checkpoint:

huggingface-cli download \
  Watay/AAD-1 \
  --include "14b_i2v_2step_transformer/*" \
  --local-dir-use-symlinks False \
  --local-dir checkpoints

Quick Start

Example 1step, 5s, 81 frames:

python aad1/inference.py \
  --prompt "a couple of horses are running in the dirt" \
  --image_path assets/examples/horses_running_dirt.jpg \
  --output_path outputs/aad1_horse_1step_5s.mp4 \
  --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
  --wan_model_dir wan_models/Wan2.1-T2V-14B \
  --num_frames 81 \
  --seed 1000 \
  --sp_size 1 \
  --denoising_timestep_list 1000

Example 1step, 20s, 321 frames:

python aad1/inference.py \
  --prompt "two people scuba diving in the ocean" \
  --image_path assets/examples/scuba_diving_ocean.jpg \
  --output_path outputs/aad1_scuba_1step_20s.mp4 \
  --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
  --wan_model_dir wan_models/Wan2.1-T2V-14B \
  --num_frames 321 \
  --seed 1000 \
  --sp_size 1 \
  --denoising_timestep_list 1000

More examples are in docs/inference-examples.md.

Acknowledgements

We thank the authors and contributors of Wan2.1, CausVid, Self Forcing, and FastVideo for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.

Citation

@article{li2026aad1,
  title={AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation},
  author={Li, Haobo and Zeng, Yanhong and Lu, Yunhong and Zhu, Jiapeng and Ouyang, Hao and Wang, Qiuyu and Cheng, Ka Leong and Shen, Yujun and Zhang, Zhipeng},
  journal={arXiv preprint arXiv:2606.03972},
  year={2026}
}

About

[ICML 2026] AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors