Add model card and metadata
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
+
tags:
|
| 5 |
+
- video-generation
|
| 6 |
+
- sparse-attention
|
| 7 |
+
- autoregressive
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
|
| 11 |
+
|
| 12 |
+
This repository contains the model checkpoints for **Light Forcing**, the first sparse attention solution specifically tailored for autoregressive (AR) video generation models.
|
| 13 |
+
|
| 14 |
+
[**Paper**](https://huggingface.co/papers/2602.04789) | [**GitHub Code**](https://github.com/chengtao-lv/LightForcing)
|
| 15 |
+
|
| 16 |
+
## Introduction
|
| 17 |
+
Advanced autoregressive video generation models often suffer from the quadratic complexity of attention. Light Forcing addresses this bottleneck with two key innovations:
|
| 18 |
+
1. **Chunk-Aware Growth:** A mechanism to quantitatively estimate the contribution of each chunk, determining their sparsity allocation.
|
| 19 |
+
2. **Hierarchical Sparse Attention:** A strategy to capture historical and local context in a coarse-to-fine manner.
|
| 20 |
+
|
| 21 |
+
The method achieves a **1.2x–1.3x** end-to-end speedup while maintaining high visual quality. When combined with FP8 quantization and LightVAE, it can achieve up to a **3.0x** speedup on hardware like the RTX 5090.
|
| 22 |
+
|
| 23 |
+
## Usage
|
| 24 |
+
|
| 25 |
+
For the full environment setup, the authors recommend using the provided Docker image:
|
| 26 |
+
```bash
|
| 27 |
+
docker pull lvchengtao/light_forcing:v1
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
### Fast Inference
|
| 31 |
+
|
| 32 |
+
After setting up the environment and downloading the necessary checkpoints (e.g., Wan2.1-T2V-1.3B), you can run inference using the scripts provided in the repository.
|
| 33 |
+
|
| 34 |
+
**For short-video generation (e.g., 5s):**
|
| 35 |
+
```bash
|
| 36 |
+
python inference.py \
|
| 37 |
+
--config_path configs/light_forcing_short.yaml \
|
| 38 |
+
--output_folder videos/light_forcing_short \
|
| 39 |
+
--checkpoint_path path/to/short_video_gen.pt \
|
| 40 |
+
--data_path prompts/MovieGenVideoBench_extended.txt \
|
| 41 |
+
--use_ema
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
**For long-video generation (e.g., 15s):**
|
| 45 |
+
```bash
|
| 46 |
+
python inference.py \
|
| 47 |
+
--config_path configs/light_forcing_long.yaml \
|
| 48 |
+
--output_folder videos/light_forcing_long \
|
| 49 |
+
--checkpoint_path path/to/long_video_gen.pt \
|
| 50 |
+
--data_path prompts/MovieGenVideoBench_extended.txt \
|
| 51 |
+
--use_ema \
|
| 52 |
+
--num_output_frames 63
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Citation
|
| 56 |
+
If you find this work or the code useful, please cite:
|
| 57 |
+
```bibtex
|
| 58 |
+
@article{lv2026light,
|
| 59 |
+
title={Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention},
|
| 60 |
+
author={Lv, Chengtao and Shi, Yumeng and Huang, Yushi and Gong, Ruihao and Ren, Shen and Wang, Wenya},
|
| 61 |
+
journal={arXiv preprint arXiv:2602.04789},
|
| 62 |
+
year={2026}
|
| 63 |
+
}
|
| 64 |
+
```
|