AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

Haobo Li1,2 Β· Yanhong Zeng2,3,βœ‰ Β· Yunhong Lu4,2 Β· Jiapeng Zhu2 Β· Hao Ouyang2 Β· Qiuyu Wang2 Β· Ka Leong Cheng2 Β· Yujun Shen2 Β· Zhipeng Zhang1,5,βœ‰

1AutoLab, SAI, SJTU 2Ant Group 3Department of Automation, Tsinghua University 4Zhejiang University 5Anyverse Dynamics

πŸ“„ Paper | 🌐 Website | πŸ€— Models

We present AAD-1, an Asymmetric Adversarial Distillation framework for one-step autoregressive video world model generation. AAD-1 addresses motion collapse and training instability by combining an asymmetric generator-discriminator design with phased training: the generator remains causal for autoregressive sampling, while a bidirectional video-level discriminator scores full spatiotemporal sequences to detect global temporal failures and long-range drift. A distribution-matching warmup first bootstraps a stable one-step generator before adversarial distillation, enabling state-of-the-art one-step autoregressive video generation on VBench.

AAD-1 training pipeline

AAD-1 trains a one-step autoregressive generator in three stages. Stage I adapts a pretrained bidirectional video model into a causal generator with ODE initialization. Stage II performs one-step DMD warmup under self-rollout training. Stage III applies asymmetric adversarial refinement: the generator remains causal, while a bidirectional video-level discriminator observes full-video context to penalize temporal drift and motion collapse.

Progress

  • πŸ“ Technical Report / Paper
  • 🌐 Project Homepage
  • πŸ’» Inference Code
  • πŸ€— Pretrained Checkpoints

Setup

Clone the repository:

git clone https://github.com/AutoLab-SAI-SJTU/AAD-1.git
cd AAD-1

Install with uv:

uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install flash-attn --no-build-isolation
uv pip install -e .

Alternatively, use conda:

conda create -n AAD-1 python=3.10 -y
conda activate AAD-1
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop

Checkpoints

The public release path only needs:

  1. πŸ€— Official shared Wan model: Wan2.1-T2V-14B
  2. πŸ€— Released AAD-1 sharded generator checkpoint

Download the shared Wan components:

huggingface-cli download \
  Wan-AI/Wan2.1-T2V-14B \
  --local-dir-use-symlinks False \
  --local-dir wan_models/Wan2.1-T2V-14B

If you use a custom shared Wan path, pass it explicitly with --wan_model_dir.

Download the AAD-1 sharded generator checkpoint:

huggingface-cli download \
  Watay/AAD-1 \
  --include "14b_i2v_1step_transformer/*" \
  --local-dir-use-symlinks False \
  --local-dir checkpoints

Optional 2-step checkpoint:

huggingface-cli download \
  Watay/AAD-1 \
  --include "14b_i2v_2step_transformer/*" \
  --local-dir-use-symlinks False \
  --local-dir checkpoints

Quick Start

Run from the repository root. This command generates an 81-frame video from an input image with the 1step checkpoint on a single GPU.

python aad1/inference.py \
  --prompt "two people scuba diving in the ocean" \
  --image_path assets/examples/scuba_diving_ocean.jpg \
  --output_path outputs/aad1_scuba_1step.mp4 \
  --checkpoint_path checkpoints/14b_i2v_1step_transformer/self_forcing_generator_bf16.index.json \
  --wan_model_dir wan_models/Wan2.1-T2V-14B \
  --num_frames 81 \
  --seed 1000 \
  --denoising_timestep_list 1000

Example 2step command:

python aad1/inference.py \
  --prompt "two people scuba diving in the ocean" \
  --image_path assets/examples/scuba_diving_ocean.jpg \
  --output_path outputs/aad1_scuba_2step.mp4 \
  --checkpoint_path checkpoints/14b_i2v_2step_transformer/self_forcing_generator_bf16.index.json \
  --wan_model_dir wan_models/Wan2.1-T2V-14B \
  --num_frames 81 \
  --seed 1000 \
  --denoising_timestep_list 1000,500

More examples, including 20s generation, are in docs/inference-examples.md.

Acknowledgements

We thank the authors and contributors of Wan2.1, CausVid, Self Forcing, and FastVideo for their open research and codebases. AAD-1 builds on these foundations for causal video generation, distillation, and efficient inference.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Watay/AAD-1

Finetuned
(66)
this model

Paper for Watay/AAD-1