AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
Haobo Li1,2 Β· Yanhong Zeng2,3,β Β· Yunhong Lu4,2 Β· Jiapeng Zhu2 Β· Hao Ouyang2 Β· Qiuyu Wang2 Β· Ka Leong Cheng2 Β· Yujun Shen2 Β· Zhipeng Zhang1,5,β
1AutoLab, SAI, SJTU 2Ant Group 3Department of Automation, Tsinghua University 4Zhejiang University 5Anyverse Dynamics
π Paper | π Website | π€ Models
We present AAD-1, an Asymmetric Adversarial Distillation framework for one-step autoregressive video world model generation. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench.
AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse.
Progress
- π Technical Report / Paper
- π Project Homepage
- π» Inference Code
- π€ Pretrained Checkpoints
Setup
Clone the repository:
git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
cd AAD-1
Install with uv:
uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn --no-build-isolation
uv pip install -e .
Alternatively, use conda:
conda create -n AAD-1 python=3.10 -y
conda activate AAD-1
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop
Checkpoints
The public release path only needs:
Download the shared Wan components:
huggingface-cli download \
Wan-AI/Wan2.1-T2V-14B \
--local-dir-use-symlinks False \
--local-dir wan_models/Wan2.1-T2V-14B
If you use a custom shared Wan path, pass it explicitly with --wan_model_dir.
Download the AAD-1 sharded generator checkpoint:
huggingface-cli download \
Watay/AAD-1 \
--include "14b_i2v_1step_transformer/*" \
--local-dir-use-symlinks False \
--local-dir checkpoints
Optional 2-step checkpoint:
huggingface-cli download \
Watay/AAD-1 \
--include "14b_i2v_2step_transformer/*" \
--local-dir-use-symlinks False \
--local-dir checkpoints
Quick Start
Run from the repository root. This command generates an 81-frame video from an input image with the 1step checkpoint on a single GPU.
python aad1/inference.py \
--prompt "two people scuba diving in the ocean" \
--image_path assets/examples/scuba_diving_ocean.jpg \
--output_path outputs/aad1_scuba_1step.mp4 \
--checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
--wan_model_dir wan_models/Wan2.1-T2V-14B \
--num_frames 81 \
--seed 1000 \
--denoising_timestep_list 1000
Example 2step command:
python aad1/inference.py \
--prompt "two people scuba diving in the ocean" \
--image_path assets/examples/scuba_diving_ocean.jpg \
--output_path outputs/aad1_scuba_2step.mp4 \
--checkpoint_path checkpoints/14b_i2v_2step_transformer/self_forcing_generator_bf16.index.json \
--wan_model_dir wan_models/Wan2.1-T2V-14B \
--num_frames 81 \
--seed 1000 \
--denoising_timestep_list 1000,500
More examples, including 20s generation, are in docs/inference-examples.md.
Acknowledgements
We thank the authors and contributors of Wan2.1, CausVid, Self Forcing, and FastVideo for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.
Model tree for Watay/AAD-1
Base model
Wan-AI/Wan2.1-T2V-14B