File size: 6,289 Bytes
447465c 4caf803 447465c 4caf803 0fded54 145fd52 0fded54 4caf803 0fded54 4caf803 0fded54 4caf803 0fded54 4caf803 2cf921f ae793b6 447465c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
license: apache-2.0
pipeline_tag: text-to-video
---
<div align="center">
# Causal Forcing
### Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation
<p align="center">
<p align="center">
<div>
<a href="https://zhuhz22.github.io/" target="_blank">Hongzhou Zhu*</a><sup></sup>,
<a href="https://gracezhao1997.github.io/" target="_blank">Min Zhao*</a><sup></sup> ,
<a href="https://guandehe.github.io/" target="_blank">Guande He</a><sup></sup>,
<a href="https://scholar.google.com/citations?user=dxN1_X0AAAAJ&hl=en" target="_blank">Hang Su</a><sup></sup>,
<a href="https://zhenxuan00.github.io/" target="_blank">Chongxuan Li</a><sup></sup> ,
<a href="https://ml.cs.tsinghua.edu.cn/~jun/index.shtml" target="_blank">Jun Zhu</a><sup></sup>
</div>
<div>
<sup></sup>Tsinghua University & Shengshu & UT Austin
</div>
</div>
</p>
<h3 align="center"><a href="https://arxiv.org/abs/2602.02214">Paper</a> | <a href="https://thu-ml.github.io/CausalForcing.github.io">Website</a> | <a href="https://github.com/thu-ml/Causal-Forcing">Code</a> | <a href="https://huggingface.co/zhuhz22/Causal-Forcing/tree/main">Models</a></h3>
</p>
-----
Causal Forcing significantly outperforms Self Forcing in **both visual quality and motion dynamics**, while keeping **the same training budget and inference efficiency**—enabling real-time, streaming video generation on a single RTX 4090.
-----
## Abstract
To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural gap when full attention is replaced by causal attention. We propose **Causal Forcing** that uses an AR teacher for ODE initialization, thereby bridging the architectural gap. Empirical results show that our method outperforms all baselines across all metrics, surpassing the SOTA Self Forcing by 19.3% in Dynamic Degree, 8.7% in VisionReward, and 16.7% in Instruction Following.
## Quick Start
> The inference environment is identical to Self Forcing, so you can migrate directly using our configs and model.
### Installation
```bash
conda create -n causal_forcing python=3.10 -y
conda activate causal_forcing
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
pip install flash-attn --no-build-isolation
python setup.py develop
```
### Download checkpoints
```bash
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
hf download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
hf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints
hf download zhuhz22/Causal-Forcing framewise/causal_forcing.pt --local-dir checkpoints
```
### CLI Inference
> We open-source both the frame-wise and chunk-wise models; the former is a setting that Self Forcing has chosen not to release.
Frame-wise model (**higher dynamic degree and more expressive**):
```bash
python inference.py \
--config_path configs/causal_forcing_dmd_framewise.yaml \
--output_folder output/framewise \
--checkpoint_path checkpoints/framewise/causal_forcing.pt \
--data_path prompts/demos.txt \
--use_ema
# Note: this frame-wise config not in Self Forcing; if using its framework, migrate this config too.
```
Chunk-wise model (**more stable**):
```bash
python inference.py \
--config_path configs/causal_forcing_dmd_chunkwise.yaml \
--output_folder output/chunkwise \
--checkpoint_path checkpoints/chunkwise/causal_forcing.pt \
--data_path prompts/demos.txt
```
## Training
<details>
<summary> Stage 1: Autoregressive Diffusion Training (Can skip by using our pretrained checkpoints. Click to expand.)</summary>
First download the dataset (we provide a 6K toy dataset here):
```bash
hf download zhuhz22/Causal-Forcing-data --local-dir dataset
python utils/merge_and_get_clean.py
```
Then train the AR-diffusion model:
- Framewise:
```bash
torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
--rdzv_backend=c10d \
--rdzv_endpoint $MASTER_ADDR \
train.py \
--config_path configs/ar_diffusion_tf_framewise.yaml \
--logdir logs/ar_diffusion_framewise
```
- Chunkwise:
```bash
torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
--rdzv_backend=c10d \
--rdzv_endpoint $MASTER_ADDR \
train.py \
--config_path configs/ar_diffusion_tf_chunkwise.yaml \
--logdir logs/ar_diffusion_chunkwise
```
</details>
<details>
<summary> Stage 2: Causal ODE Initialization (Can skip by using our pretrained checkpoints. Click to expand.)</summary>
If you have skipped Stage 1, you need to download the pretrained models:
```bash
hf download zhuhz22/Causal-Forcing framewise/ar_diffusion.pt --local-dir checkpoints
hf download zhuhz22/Causal-Forcing chunkwise/ar_diffusion.pt --local-dir checkpoints
```
In this stage, train ODE initialization models:
- Frame-wise:
```bash
torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
--rdzv_backend=c10d \
--rdzv_endpoint $MASTER_ADDR \
train.py \
--config_path configs/causal_ode_framewise.yaml \
--logdir logs/causal_ode_framewise
```
</details>
### Stage 3: DMD
> This stage is compatible with Self Forcing training, so you can migrate seamlessly by using our configs and checkpoints.
- Frame-wise model:
```bash
torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
--rdzv_backend=c10d \
--rdzv_endpoint $MASTER_ADDR \
train.py \
--config_path configs/causal_forcing_dmd_framewise.yaml \
--logdir logs/causal_forcing_dmd_framewise
```
## Acknowledgements
This codebase is built on top of the open-source implementation of [CausVid](https://github.com/tianweiy/CausVid), [Self Forcing](https://github.com/guandeh17/Self-Forcing) and the [Wan2.1](https://github.com/Wan-Video/Wan2.1) repo.
## References
```bibtex
@article{zhu2026causal,
title={Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation},
author={Zhu, Hongzhou and Zhao, Min and He, Guande and Su, Hang and Li, Chongxuan and Zhu, Jun},
journal={arXiv preprint arXiv:2602.02214},
year={2026}
}
``` |