Watay commited on
Commit
b6816fd
Β·
verified Β·
1 Parent(s): 9108d69

Update model card

Browse files
Files changed (1) hide show
  1. README.md +166 -1
README.md CHANGED
@@ -1,3 +1,168 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-nc-sa-4.0
3
+ library_name: pytorch
4
+ pipeline_tag: image-to-video
5
+ tags:
6
+ - image-to-video
7
+ - video-generation
8
+ - autoregressive-video-generation
9
+ - one-step-generation
10
+ - adversarial-distillation
11
+ - wan
12
+ base_model:
13
+ - Wan-AI/Wan2.1-T2V-14B
14
  ---
15
+
16
+ # AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
17
+
18
+ <p align="center">
19
+ <a href="https://github.com/AutoLab-SAI-SJTU/AAD-1">Code</a> Β·
20
+ <a href="https://aad-1.github.io/">Project Page</a> Β·
21
+ <a href="https://huggingface.co/Wan-AI/Wan2.1-T2V-14B">Wan2.1-T2V-14B</a>
22
+ </p>
23
+
24
+ AAD-1 is an Asymmetric Adversarial Distillation framework for one-step autoregressive image-to-video generation. It addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation.
25
+
26
+ This repository hosts the released AAD-1 generator checkpoints. Inference code is available at [AutoLab-SAI-SJTU/AAD-1](https://github.com/AutoLab-SAI-SJTU/AAD-1).
27
+
28
+ ## Model Files
29
+
30
+ The public checkpoint is released in sharded native Self-Forcing format:
31
+
32
+ ```text
33
+ 14b_i2v_1step_transformer/
34
+ β”œβ”€β”€ self_forcing_generator_bf16.index.json
35
+ β”œβ”€β”€ self_forcing_generator_bf16-00001-of-xxxxx.pt
36
+ └── ...
37
+ ```
38
+
39
+ An optional 2-step checkpoint may also be available:
40
+
41
+ ```text
42
+ 14b_i2v_2step_transformer/
43
+ β”œβ”€β”€ self_forcing_generator_bf16.index.json
44
+ β”œβ”€β”€ self_forcing_generator_bf16-00001-of-xxxxx.pt
45
+ └── ...
46
+ ```
47
+
48
+ Use the `.index.json` file as `--checkpoint_path` in the inference command.
49
+
50
+ ## Requirements
51
+
52
+ AAD-1 inference requires:
53
+
54
+ 1. The AAD-1 sharded generator checkpoint from this repository.
55
+ 2. The official shared Wan model components from [Wan-AI/Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B).
56
+ 3. The inference code from [AutoLab-SAI-SJTU/AAD-1](https://github.com/AutoLab-SAI-SJTU/AAD-1).
57
+
58
+ ## Installation
59
+
60
+ ```bash
61
+ git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
62
+ cd AAD-1
63
+
64
+ uv venv --python 3.10
65
+ source .venv/bin/activate
66
+ uv pip install -r requirements.txt
67
+ uv pip install flash-attn --no-build-isolation
68
+ uv pip install -e .
69
+ ```
70
+
71
+ Alternatively, use conda:
72
+
73
+ ```bash
74
+ conda create -n self_forcing python=3.10 -y
75
+ conda activate self_forcing
76
+ pip install -r requirements.txt
77
+ pip install flash-attn --no-build-isolation
78
+ python setup.py develop
79
+ ```
80
+
81
+ ## Download Checkpoints
82
+
83
+ Download the official shared Wan components:
84
+
85
+ ```bash
86
+ python -m huggingface_hub.commands.huggingface_cli download \
87
+ Wan-AI/Wan2.1-T2V-14B \
88
+ --local-dir-use-symlinks False \
89
+ --local-dir wan_models/Wan2.1-T2V-14B
90
+ ```
91
+
92
+ Download the AAD-1 1-step checkpoint:
93
+
94
+ ```bash
95
+ python -m huggingface_hub.commands.huggingface_cli download \
96
+ Watay/AAD-1 \
97
+ --include "14b_i2v_1step_transformer/*" \
98
+ --local-dir-use-symlinks False \
99
+ --local-dir checkpoints
100
+ ```
101
+
102
+ Optional 2-step checkpoint:
103
+
104
+ ```bash
105
+ python -m huggingface_hub.commands.huggingface_cli download \
106
+ Watay/AAD-1 \
107
+ --include "14b_i2v_2step_transformer/*" \
108
+ --local-dir-use-symlinks False \
109
+ --local-dir checkpoints
110
+ ```
111
+
112
+ ## Quick Start
113
+
114
+ ```bash
115
+ TORCH_COMPILE_DISABLE=1 TORCHDYNAMO_DISABLE=1 \
116
+ CUDA_VISIBLE_DEVICES=0 \
117
+ python aad1/inference.py \
118
+ --prompt "two people scuba diving in the ocean" \
119
+ --image_path "assets/examples/two people scuba diving in the ocean.jpg" \
120
+ --output_path outputs/aad1_scuba_1step.mp4 \
121
+ --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
122
+ --wan_model_dir wan_models/Wan2.1-T2V-14B \
123
+ --num_frames 81 \
124
+ --height 480 \
125
+ --width 832 \
126
+ --seed 1000 \
127
+ --local_attn_size 9 \
128
+ --sink_size 1 \
129
+ --denoising_timestep_list 1000
130
+ ```
131
+
132
+ For the optional 2-step checkpoint, use:
133
+
134
+ ```bash
135
+ --checkpoint_path checkpoints/14b_i2v_2step_transformer/self_forcing_generator_bf16.index.json \
136
+ --denoising_timestep_list 1000,500
137
+ ```
138
+
139
+ ## Intended Use
140
+
141
+ AAD-1 is intended for research and non-commercial experimentation with image-to-video generation, long-horizon autoregressive video rollout, and one-step video generation. Users provide a reference image and text prompt, and the model generates a video conditioned on both inputs.
142
+
143
+ ## Limitations
144
+
145
+ - Generated videos may contain visual artifacts, temporal inconsistencies, identity drift, incorrect physical interactions, or prompt-following errors.
146
+ - The model may reflect biases or unsafe associations inherited from training data and upstream models.
147
+ - This release is for inference; training scripts and training data are not part of this checkpoint release.
148
+ - Users are responsible for complying with the licenses and usage terms of AAD-1 and its upstream dependencies, including Wan2.1.
149
+
150
+ ## License
151
+
152
+ This model is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.
153
+
154
+ ## Acknowledgements
155
+
156
+ We thank the authors and contributors of [Wan2.1](https://github.com/Wan-Video/Wan2.1), [CausVid](https://github.com/tianweiy/CausVid), [Self Forcing](https://github.com/guandeh17/Self-Forcing), and [FastVideo](https://github.com/hao-ai-lab/FastVideo) for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.
157
+
158
+ ## Citation
159
+
160
+ ```bibtex
161
+ @inproceedings{li2026aad1,
162
+ title={AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation},
163
+ author={Haobo Li and Yanhong Zeng and Yunhong Lu and Jiapeng Zhu and Hao Ouyang and Qiuyu Wang and Ka Leong Cheng and Yujun Shen and Zhipeng Zhang},
164
+ booktitle={Proceedings of the 43rd International Conference on Machine Learning},
165
+ year={2026},
166
+ note={To appear}
167
+ }
168
+ ```