Haobo Li1,2 · Yanhong Zeng2,3,✉ · Yunhong Lu4,2 · Jiapeng Zhu2 · Hao Ouyang2 · Qiuyu Wang2 · Ka Leong Cheng2 · Yujun Shen2 · Zhipeng Zhang1,5,✉
1AutoLab, SAI, SJTU 2Ant Group 3Department of Automation, Tsinghua University 4Zhejiang University 5Anyverse Dynamics
Causal | 1-Step | One-Step | Autoregressive | Video World Model
Keywords: causal video generation, 1-step video generation, one-step autoregressive generation, autoregressive video world model, video-world-model rollouts, causal Wan backbone, long-horizon video generation.
We present AAD-1, a causal, 1-step / one-step, autoregressive video world model built with Asymmetric Adversarial Distillation. Given a reference image and a text prompt, AAD-1 generates long-horizon video-world-model rollouts with one sampling step per causal chunk. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench.
AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse.
- 📝 Technical Report / Paper
- 🌐 Project Homepage
- 💻 Inference Code
- 🤗 Pretrained Checkpoints
Clone the repository:
git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
cd AAD-1Install with uv:
uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn --no-build-isolation
uv pip install -e .Alternatively, use conda:
conda create -n AAD-1 python=3.10 -y
conda activate AAD-1
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py developThe public release path only needs:
Download the shared Wan components:
huggingface-cli download \
Wan-AI/Wan2.1-T2V-14B \
--local-dir-use-symlinks False \
--local-dir wan_models/Wan2.1-T2V-14BIf you use a custom shared Wan path, pass it explicitly with --wan_model_dir.
Download the AAD-1 sharded generator checkpoint:
huggingface-cli download \
Watay/AAD-1 \
--include "14b_i2v_1step_transformer/*" \
--local-dir-use-symlinks False \
--local-dir checkpointsOptional 2-step checkpoint:
huggingface-cli download \
Watay/AAD-1 \
--include "14b_i2v_2step_transformer/*" \
--local-dir-use-symlinks False \
--local-dir checkpointsExample 1step, 5s, 81 frames:
python aad1/inference.py \
--prompt "a couple of horses are running in the dirt" \
--image_path assets/examples/horses_running_dirt.jpg \
--output_path outputs/aad1_horse_1step_5s.mp4 \
--checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
--wan_model_dir wan_models/Wan2.1-T2V-14B \
--num_frames 81 \
--seed 1000 \
--sp_size 1 \
--denoising_timestep_list 1000Example 1step, 20s, 321 frames:
python aad1/inference.py \
--prompt "two people scuba diving in the ocean" \
--image_path assets/examples/scuba_diving_ocean.jpg \
--output_path outputs/aad1_scuba_1step_20s.mp4 \
--checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
--wan_model_dir wan_models/Wan2.1-T2V-14B \
--num_frames 321 \
--seed 1000 \
--sp_size 1 \
--denoising_timestep_list 1000More examples are in docs/inference-examples.md.
We thank the authors and contributors of Wan2.1, CausVid, Self Forcing, and FastVideo for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.
@article{li2026aad1,
title={AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation},
author={Li, Haobo and Zeng, Yanhong and Lu, Yunhong and Zhu, Jiapeng and Ouyang, Hao and Wang, Qiuyu and Cheng, Ka Leong and Shen, Yujun and Zhang, Zhipeng},
journal={arXiv preprint arXiv:2606.03972},
year={2026}
}