AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

Haobo Li^1,2 · Yanhong Zeng^2,3,✉ · Yunhong Lu^4,2 · Jiapeng Zhu² · Hao Ouyang² · Qiuyu Wang² · Ka Leong Cheng² · Yujun Shen² · Zhipeng Zhang^1,5,✉

¹AutoLab, SAI, SJTU ²Ant Group ³Department of Automation, Tsinghua University ⁴Zhejiang University ⁵Anyverse Dynamics

📄 Paper | 🌐 Website | 🤗 Models

Causal | 1-Step | One-Step | Autoregressive | Video World Model

_{Keywords: causal video generation, 1-step video generation, one-step autoregressive generation, autoregressive video world model, video-world-model rollouts, causal Wan backbone, long-horizon video generation.}

We present AAD-1, a causal, 1-step / one-step, autoregressive video world model built with Asymmetric Adversarial Distillation. Given a reference image and a text prompt, AAD-1 generates long-horizon video-world-model rollouts with one sampling step per causal chunk. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench.

AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse.

Progress

📝 Technical Report / Paper
🌐 Project Homepage
💻 Inference Code
🤗 Pretrained Checkpoints

Setup

Clone the repository:

git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
cd AAD-1

Install with uv:

uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn --no-build-isolation
uv pip install -e .

Alternatively, use conda:

conda create -n AAD-1 python=3.10 -y
conda activate AAD-1
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop

Checkpoints

The public release path only needs:

Download the shared Wan components:

huggingface-cli download \
  Wan-AI/Wan2.1-T2V-14B \
  --local-dir-use-symlinks False \
  --local-dir wan_models/Wan2.1-T2V-14B

If you use a custom shared Wan path, pass it explicitly with --wan_model_dir.

Download the AAD-1 sharded generator checkpoint:

huggingface-cli download \
  Watay/AAD-1 \
  --include "14b_i2v_1step_transformer/*" \
  --local-dir-use-symlinks False \
  --local-dir checkpoints

Optional 2-step checkpoint:

huggingface-cli download \
  Watay/AAD-1 \
  --include "14b_i2v_2step_transformer/*" \
  --local-dir-use-symlinks False \
  --local-dir checkpoints

Quick Start

Example 1step, 5s, 81 frames:

python aad1/inference.py \
  --prompt "a couple of horses are running in the dirt" \
  --image_path assets/examples/horses_running_dirt.jpg \
  --output_path outputs/aad1_horse_1step_5s.mp4 \
  --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
  --wan_model_dir wan_models/Wan2.1-T2V-14B \
  --num_frames 81 \
  --seed 1000 \
  --sp_size 1 \
  --denoising_timestep_list 1000

Example 1step, 20s, 321 frames:

python aad1/inference.py \
  --prompt "two people scuba diving in the ocean" \
  --image_path assets/examples/scuba_diving_ocean.jpg \
  --output_path outputs/aad1_scuba_1step_20s.mp4 \
  --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
  --wan_model_dir wan_models/Wan2.1-T2V-14B \
  --num_frames 321 \
  --seed 1000 \
  --sp_size 1 \
  --denoising_timestep_list 1000

More examples are in docs/inference-examples.md.

Acknowledgements

We thank the authors and contributors of Wan2.1, CausVid, Self Forcing, and FastVideo for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.

Citation

@article{li2026aad1,
  title={AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation},
  author={Li, Haobo and Zeng, Yanhong and Lu, Yunhong and Zhu, Jiapeng and Ouyang, Hao and Wang, Qiuyu and Cheng, Ka Leong and Shen, Yujun and Zhang, Zhipeng},
  journal={arXiv preprint arXiv:2606.03972},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
aad1		aad1
assets		assets
configs		configs
demo_utils		demo_utils
docs		docs
pipeline		pipeline
templates		templates
utils		utils
wan		wan
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

📄 Paper | 🌐 Website | 🤗 Models

Progress

Setup

Checkpoints

Quick Start

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

📄 Paper | 🌐 Website | 🤗 Models

Progress

Setup

Checkpoints

Quick Start

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages