Watay
/

AAD-1

 ---
+license: cc-by-nc-sa-4.0
+library_name: pytorch
+pipeline_tag: image-to-video
+tags:
+  - image-to-video
+  - video-generation
+  - autoregressive-video-generation
+  - one-step-generation
+  - adversarial-distillation
+  - wan
+base_model:
+  - Wan-AI/Wan2.1-T2V-14B
 ---
+# AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
+<p align="center">
+  <a href="https://github.com/AutoLab-SAI-SJTU/AAD-1">Code</a> ·
+  <a href="https://aad-1.github.io/">Project Page</a> ·
+  <a href="https://huggingface.co/Wan-AI/Wan2.1-T2V-14B">Wan2.1-T2V-14B</a>
+</p>
+AAD-1 is an Asymmetric Adversarial Distillation framework for one-step autoregressive image-to-video generation. It addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation.
+This repository hosts the released AAD-1 generator checkpoints. Inference code is available at [AutoLab-SAI-SJTU/AAD-1](https://github.com/AutoLab-SAI-SJTU/AAD-1).
+## Model Files
+The public checkpoint is released in sharded native Self-Forcing format:
+```text
+14b_i2v_1step_transformer/
+├── self_forcing_generator_bf16.index.json
+├── self_forcing_generator_bf16-00001-of-xxxxx.pt
+└── ...
+```
+An optional 2-step checkpoint may also be available:
+```text
+14b_i2v_2step_transformer/
+├── self_forcing_generator_bf16.index.json
+├── self_forcing_generator_bf16-00001-of-xxxxx.pt
+└── ...
+```
+Use the `.index.json` file as `--checkpoint_path` in the inference command.
+## Requirements
+AAD-1 inference requires:
+1. The AAD-1 sharded generator checkpoint from this repository.
+2. The official shared Wan model components from [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B).
+3. The inference code from [AutoLab-SAI-SJTU/AAD-1](https://github.com/AutoLab-SAI-SJTU/AAD-1).
+## Installation
+```bash
+git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
+cd AAD-1
+uv venv --python 3.10
+source .venv/bin/activate
+uv pip install -r requirements.txt
+uv pip install flash-attn --no-build-isolation
+uv pip install -e .
+```
+Alternatively, use conda:
+```bash
+conda create -n self_forcing python=3.10 -y
+conda activate self_forcing
+pip install -r requirements.txt
+pip install flash-attn --no-build-isolation
+python setup.py develop
+```
+## Download Checkpoints
+Download the official shared Wan components:
+```bash
+python -m huggingface_hub.commands.huggingface_cli download \
+  Wan-AI/Wan2.1-T2V-14B \
+  --local-dir-use-symlinks False \
+  --local-dir wan_models/Wan2.1-T2V-14B
+```
+Download the AAD-1 1-step checkpoint:
+```bash
+python -m huggingface_hub.commands.huggingface_cli download \
+  Watay/AAD-1 \
+  --include "14b_i2v_1step_transformer/*" \
+  --local-dir-use-symlinks False \
+  --local-dir checkpoints
+```
+Optional 2-step checkpoint:
+```bash
+python -m huggingface_hub.commands.huggingface_cli download \
+  Watay/AAD-1 \
+  --include "14b_i2v_2step_transformer/*" \
+  --local-dir-use-symlinks False \
+  --local-dir checkpoints
+```
+## Quick Start
+```bash
+TORCH_COMPILE_DISABLE=1 TORCHDYNAMO_DISABLE=1 \
+CUDA_VISIBLE_DEVICES=0 \
+python aad1/inference.py \
+  --prompt "two people scuba diving in the ocean" \
+  --image_path "assets/examples/two people scuba diving in the ocean.jpg" \
+  --output_path outputs/aad1_scuba_1step.mp4 \
+  --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
+  --wan_model_dir wan_models/Wan2.1-T2V-14B \
+  --num_frames 81 \
+  --height 480 \
+  --width 832 \
+  --seed 1000 \
+  --local_attn_size 9 \
+  --sink_size 1 \
+  --denoising_timestep_list 1000
+```
+For the optional 2-step checkpoint, use:
+```bash
+--checkpoint_path checkpoints/14b_i2v_2step_transformer/self_forcing_generator_bf16.index.json \
+--denoising_timestep_list 1000,500
+```
+## Intended Use
+AAD-1 is intended for research and non-commercial experimentation with image-to-video generation, long-horizon autoregressive video rollout, and one-step video generation. Users provide a reference image and text prompt, and the model generates a video conditioned on both inputs.
+## Limitations
+- Generated videos may contain visual artifacts, temporal inconsistencies, identity drift, incorrect physical interactions, or prompt-following errors.
+- The model may reflect biases or unsafe associations inherited from training data and upstream models.
+- This release is for inference; training scripts and training data are not part of this checkpoint release.
+- Users are responsible for complying with the licenses and usage terms of AAD-1 and its upstream dependencies, including Wan2.1.
+## License
+This model is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
+## Acknowledgements
+We thank the authors and contributors of [Wan2.1](https://github.com/Wan-Video/Wan2.1), [CausVid](https://github.com/tianweiy/CausVid), [Self Forcing](https://github.com/guandeh17/Self-Forcing), and [FastVideo](https://github.com/hao-ai-lab/FastVideo) for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.
+## Citation
+```bibtex
+@inproceedings{li2026aad1,
+  title={AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation},
+  author={Haobo Li and Yanhong Zeng and Yunhong Lu and Jiapeng Zhu and Hao Ouyang and Qiuyu Wang and Ka Leong Cheng and Yujun Shen and Zhipeng Zhang},
+  booktitle={Proceedings of the 43rd International Conference on Machine Learning},
+  year={2026},
+  note={To appear}
+}
+```