---
license: cc-by-nc-sa-4.0
library_name: pytorch
pipeline_tag: image-to-video
tags:
- image-to-video
- video-generation
- autoregressive-video-generation
- one-step-generation
- adversarial-distillation
- wan
base_model:
- Wan-AI/Wan2.1-T2V-14B
---
# AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
Haobo Li1,2 ·
Yanhong Zeng2,3,✉ ·
Yunhong Lu4,2 ·
Jiapeng Zhu2 ·
Hao Ouyang2 ·
Qiuyu Wang2 ·
Ka Leong Cheng2 ·
Yujun Shen2 ·
Zhipeng Zhang1,5,✉
1AutoLab, SAI, SJTU
2Ant Group
3Department of Automation, Tsinghua University
4Zhejiang University
5Anyverse Dynamics
We present **AAD-1**, an Asymmetric Adversarial Distillation framework for one-step autoregressive video world model generation. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench.

AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse.
## Progress
- [x] 📝 Technical Report / Paper
- [x] 🌐 Project Homepage
- [x] 💻 Inference Code
- [x] 🤗 Pretrained Checkpoints
## Setup
Clone the repository:
```bash
git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
cd AAD-1
```
Install with `uv`:
```bash
uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn --no-build-isolation
uv pip install -e .
```
Alternatively, use `conda`:
```bash
conda create -n AAD-1 python=3.10 -y
conda activate AAD-1
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop
```
## Checkpoints
The public release path only needs:
1. 🤗 [Official shared Wan model: Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)
2. 🤗 [Released AAD-1 sharded generator checkpoint](https://huggingface.co/Watay/AAD-1)
Download the shared Wan components:
```bash
huggingface-cli download \
Wan-AI/Wan2.1-T2V-14B \
--local-dir-use-symlinks False \
--local-dir wan_models/Wan2.1-T2V-14B
```
If you use a custom shared Wan path, pass it explicitly with `--wan_model_dir`.
Download the AAD-1 sharded generator checkpoint:
```bash
huggingface-cli download \
Watay/AAD-1 \
--include "14b_i2v_1step_transformer/*" \
--local-dir-use-symlinks False \
--local-dir checkpoints
```
Optional 2-step checkpoint:
```bash
huggingface-cli download \
Watay/AAD-1 \
--include "14b_i2v_2step_transformer/*" \
--local-dir-use-symlinks False \
--local-dir checkpoints
```
## Quick Start
Run from the repository root. This command generates an 81-frame video from an input image with the `1step` checkpoint on a single GPU.
```bash
python aad1/inference.py \
--prompt "two people scuba diving in the ocean" \
--image_path assets/examples/scuba_diving_ocean.jpg \
--output_path outputs/aad1_scuba_1step.mp4 \
--checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
--wan_model_dir wan_models/Wan2.1-T2V-14B \
--num_frames 81 \
--seed 1000 \
--denoising_timestep_list 1000
```
Example `2step` command:
```bash
python aad1/inference.py \
--prompt "two people scuba diving in the ocean" \
--image_path assets/examples/scuba_diving_ocean.jpg \
--output_path outputs/aad1_scuba_2step.mp4 \
--checkpoint_path checkpoints/14b_i2v_2step_transformer/self_forcing_generator_bf16.index.json \
--wan_model_dir wan_models/Wan2.1-T2V-14B \
--num_frames 81 \
--seed 1000 \
--denoising_timestep_list 1000,500
```
More examples, including 20s generation, are in [docs/inference-examples.md](docs/inference-examples.md).
## Acknowledgements
We thank the authors and contributors of [Wan2.1](https://github.com/Wan-Video/Wan2.1), [CausVid](https://github.com/tianweiy/CausVid), [Self Forcing](https://github.com/guandeh17/Self-Forcing), and [FastVideo](https://github.com/hao-ai-lab/FastVideo) for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.