| --- |
| license: cc-by-nc-sa-4.0 |
| library_name: pytorch |
| pipeline_tag: image-to-video |
| tags: |
| - image-to-video |
| - video-generation |
| - autoregressive-video-generation |
| - one-step-generation |
| - adversarial-distillation |
| - wan |
| base_model: |
| - Wan-AI/Wan2.1-T2V-14B |
| --- |
| |
| # AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation |
|
|
| <p align="center"> |
| <a href="https://github.com/HaobroLi">Haobo Li</a><sup>1,2</sup> 路 |
| <a href="https://zengyh1900.github.io/">Yanhong Zeng</a><sup>2,3,✉</sup> 路 |
| <a href="https://github.com/JaydenLyh">Yunhong Lu</a><sup>4,2</sup> 路 |
| <a href="https://github.com/zhujiapeng">Jiapeng Zhu</a><sup>2</sup> 路 |
| <a href="https://ken-ouyang.github.io/">Hao Ouyang</a><sup>2</sup> 路 |
| <a href="https://github.com/qiuyu96">Qiuyu Wang</a><sup>2</sup> 路 |
| <a href="https://felixcheng97.github.io/">Ka Leong Cheng</a><sup>2</sup> 路 |
| <a href="https://shenyujun.github.io/">Yujun Shen</a><sup>2</sup> 路 |
| <a href="https://zhipengzhang.cn/">Zhipeng Zhang</a><sup>1,5,✉</sup> |
| </p> |
|
|
| <p align="center"> |
| <sup>1</sup>AutoLab, SAI, SJTU |
| <sup>2</sup>Ant Group |
| <sup>3</sup>Department of Automation, Tsinghua University |
| <sup>4</sup>Zhejiang University |
| <sup>5</sup>Anyverse Dynamics |
| </p> |
|
|
| <h2 align="center"> |
| <a href="https://arxiv.org/abs/2606.03972">馃搫 Paper</a> | |
| <a href="https://aad-1.github.io/">馃寪 Website</a> | |
| <a href="https://huggingface.co/Watay/AAD-1">馃 Models</a> |
| </h2> |
|
|
| We present **AAD-1**, an Asymmetric Adversarial Distillation framework for one-step autoregressive video world model generation. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench. |
|
|
|  |
|
|
| AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse. |
|
|
| ## Progress |
|
|
| - [x] 馃摑 Technical Report / Paper |
| - [x] 馃寪 Project Homepage |
| - [x] 馃捇 Inference Code |
| - [x] 馃 Pretrained Checkpoints |
|
|
| ## Setup |
|
|
| Clone the repository: |
|
|
| ```bash |
| git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git |
| cd AAD-1 |
| ``` |
|
|
| Install with `uv`: |
|
|
| ```bash |
| uv venv --python 3.10 |
| source .venv/bin/activate |
| uv pip install -r requirements.txt |
| uv pip install flash-attn --no-build-isolation |
| uv pip install -e . |
| ``` |
|
|
| Alternatively, use `conda`: |
|
|
| ```bash |
| conda create -n AAD-1 python=3.10 -y |
| conda activate AAD-1 |
| pip install -r requirements.txt |
| pip install flash-attn --no-build-isolation |
| python setup.py develop |
| ``` |
|
|
| ## Checkpoints |
|
|
| The public release path only needs: |
|
|
| 1. 馃 [Official shared Wan model: Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) |
| 2. 馃 [Released AAD-1 sharded generator checkpoint](https://huggingface.co/Watay/AAD-1) |
|
|
| Download the shared Wan components: |
|
|
| ```bash |
| huggingface-cli download \ |
| Wan-AI/Wan2.1-T2V-14B \ |
| --local-dir-use-symlinks False \ |
| --local-dir wan_models/Wan2.1-T2V-14B |
| ``` |
|
|
| If you use a custom shared Wan path, pass it explicitly with `--wan_model_dir`. |
|
|
| Download the AAD-1 sharded generator checkpoint: |
|
|
| ```bash |
| huggingface-cli download \ |
| Watay/AAD-1 \ |
| --include "14b_i2v_1step_transformer/*" \ |
| --local-dir-use-symlinks False \ |
| --local-dir checkpoints |
| ``` |
|
|
| Optional 2-step checkpoint: |
|
|
| ```bash |
| huggingface-cli download \ |
| Watay/AAD-1 \ |
| --include "14b_i2v_2step_transformer/*" \ |
| --local-dir-use-symlinks False \ |
| --local-dir checkpoints |
| ``` |
|
|
| ## Quick Start |
|
|
| Run from the repository root. This command generates an 81-frame video from an input image with the `1step` checkpoint on a single GPU. |
|
|
| ```bash |
| python aad1/inference.py \ |
| --prompt "two people scuba diving in the ocean" \ |
| --image_path assets/examples/scuba_diving_ocean.jpg \ |
| --output_path outputs/aad1_scuba_1step.mp4 \ |
| --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \ |
| --wan_model_dir wan_models/Wan2.1-T2V-14B \ |
| --num_frames 81 \ |
| --seed 1000 \ |
| --denoising_timestep_list 1000 |
| ``` |
|
|
| Example `2step` command: |
|
|
| ```bash |
| python aad1/inference.py \ |
| --prompt "two people scuba diving in the ocean" \ |
| --image_path assets/examples/scuba_diving_ocean.jpg \ |
| --output_path outputs/aad1_scuba_2step.mp4 \ |
| --checkpoint_path checkpoints/14b_i2v_2step_transformer/self_forcing_generator_bf16.index.json \ |
| --wan_model_dir wan_models/Wan2.1-T2V-14B \ |
| --num_frames 81 \ |
| --seed 1000 \ |
| --denoising_timestep_list 1000,500 |
| ``` |
|
|
| More examples, including 20s generation, are in [docs/inference-examples.md](docs/inference-examples.md). |
|
|
| ## Acknowledgements |
|
|
| We thank the authors and contributors of [Wan2.1](https://github.com/Wan-Video/Wan2.1), [CausVid](https://github.com/tianweiy/CausVid), [Self Forcing](https://github.com/guandeh17/Self-Forcing), and [FastVideo](https://github.com/hao-ai-lab/FastVideo) for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference. |
|
|