Update model card to reflect Light Forcing paper and code

#14
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +72 -305
README.md CHANGED
@@ -1,21 +1,61 @@
1
  ---
2
- license: apache-2.0
3
- tags:
4
- - diffusion-single-file
5
- - comfyui
6
- - distillation
7
- - LoRA
8
- - video
9
- - video genration
10
  base_model:
11
- - Wan-AI/Wan2.2-I2V-A14B
12
- - Wan-AI/Wan2.2-TI2V-5B
13
- - Wan-AI/Wan2.1-I2V-14B-720P
14
- pipeline_tags:
15
- - image-to-video
16
- - text-to-video
17
  library_name: diffusers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
 
19
  # 🎨 LightVAE
20
 
21
  ## ⚡ Efficient Video Autoencoder (VAE) Model Collection
@@ -110,215 +150,27 @@ For VAE, the LightX2V team has conducted a series of deep optimizations, derivin
110
 
111
  ---
112
 
 
113
 
114
- ## 📊 Wan2.1 Series Performance Comparison
115
- - **Precision**: BF16
116
- - **Test Hardware**: NVIDIA H100
117
-
118
- ### Video Reconstruction (5s 81-frame video)
119
 
120
- |Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
121
  |:-----|:--------------|:------------|:---------------------|:-------------|
122
- | **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s |
123
  | **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |
124
 
125
- |GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
126
  |:-----|:--------------|:------------|:---------------------|:-------------|
127
  | **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
128
  | **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |
129
 
130
- ### Video Generation
131
-
132
- Task: s2v(speech to video)
133
- Model: seko-talk
134
-
135
- <table>
136
- <tr>
137
- <td width="25%" align="center">
138
- <strong>Wan2.1_VAE</strong><br>
139
- <video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4"></video>
140
- </td>
141
- <td width="25%" align="center">
142
- <strong>taew2_1</strong><br>
143
- <video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4"></video>
144
- </td>
145
- <td width="25%" align="center">
146
- <strong>lighttaew2_1</strong><br>
147
- <video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4"></video>
148
- </td>
149
- <td width="25%" align="center">
150
- <strong>lightvaew2_1</strong><br>
151
- <video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4"></video>
152
- </td>
153
- </tr>
154
- </table>
155
-
156
- ## 📊 Wan2.2 Series Performance Comparison
157
- - **Precision**: BF16
158
- - **Test Hardware**: NVIDIA H100
159
-
160
- ### Video Reconstruction
161
- | Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
162
- |:-----|:--------------|:------------|:---------------------|
163
- | **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s |
164
- | **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s|
165
-
166
- | GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
167
- |:-----|:--------------|:------------|:---------------------|
168
- | **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB |
169
- | **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB |
170
-
171
-
172
- ### Video Generation
173
-
174
- Task: t2v(text to video)
175
- Model: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
176
 
177
- <table>
178
- <tr>
179
- <td width="33%" align="center">
180
- <strong>Wan2.2_VAE</strong><br>
181
- <video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4"></video>
182
- </td>
183
- <td width="33%" align="center">
184
- <strong>taew2_2</strong><br>
185
- <video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4"></video>
186
- </td>
187
- <td width="33%" align="center">
188
- <strong>lighttaew2_2</strong><br>
189
- <video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4"></video>
190
- </td>
191
- </tr>
192
- </table>
193
-
194
-
195
-
196
- ## 🎯 Model Selection Recommendations
197
-
198
- ### Selection by Use Case
199
-
200
- <table>
201
- <tr>
202
- <td width="33%">
203
-
204
- #### 🏆 Pursuing Best Quality
205
- **Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE`
206
-
207
- - ✅ Official model, quality ceiling
208
- - ✅ Highest reconstruction accuracy
209
- - ✅ Suitable for final product output
210
- - ⚠️ **Large memory usage** (~8-12 GB)
211
- - ⚠️ **Slow inference speed**
212
-
213
- </td>
214
- <td width="33%">
215
-
216
- #### ⚖️ **Best Balance** 🏆
217
- **Recommended**: **`lightvaew2_1`**
218
-
219
- - ✅ **Uses Causal 3D Conv** (same as official)
220
- - ✅ **Excellent quality**, close to official
221
- - ✅ Memory reduced by **~50%** (~4-5 GB)
222
- - ✅ Speed increased by **2-3x**
223
- - ✅ **Close to official quality** ⭐⭐⭐⭐
224
-
225
- **Use Cases**: Daily production, strongly recommended ⭐
226
-
227
- </td>
228
- <td width="33%">
229
-
230
- #### ⚡ **Speed + Quality Balance** ✨
231
- **Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`**
232
-
233
- - ✅ Extremely low memory usage (~0.4 GB)
234
- - ✅ Extremely fast inference
235
- - ✅ **Quality significantly surpasses open source TAE**
236
- - ✅ **Close to official quality** ⭐⭐⭐⭐
237
-
238
- **Use Cases**: Development testing, rapid iteration
239
-
240
- </td>
241
- </tr>
242
- </table>
243
-
244
-
245
- ### 🔥 Our Optimization Results Comparison
246
-
247
- | Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** |
248
- |:------|:--------|:---------------------|:---------|:---------------------|
249
- | **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D |
250
- | **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) |
251
- | **Inference Speed** | Extremely Fast ⚡⚡⚡⚡⚡ | Extremely Fast ⚡⚡⚡⚡⚡ | Slow ⚡⚡ | Fast ⚡⚡⚡⚡ |
252
- | **Generation Quality** | Average ⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | Highest ⭐⭐⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ |
253
-
254
- ## 📑 Todo List
255
- - [x] LightX2V integration
256
- - [x] ComfyUI integration
257
- - [ ] Training & Distillation Code
258
-
259
- ## 🚀 Usage
260
-
261
- ### Download VAE Models
262
 
263
  ```bash
264
- # Download Wan2.1 official VAE
265
- huggingface-cli download lightx2v/Autoencoders \
266
- --local-dir ./models/vae/
267
- ```
268
-
269
- ### 🧪 Video Reconstruction Test
270
-
271
- We provide a standalone script `vid_recon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality.
272
-
273
- **Script Location**: `LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py`
274
-
275
- ```bash
276
- git clone https://github.com/ModelTC/LightX2V.git
277
- cd LightX2V
278
- ```
279
-
280
- **1. Test Official VAE (Wan2.1)**
281
- ```bash
282
- python -m lightx2v.models.video_encoders.hf.vid_recon \
283
- input_video.mp4 \
284
- --checkpoint ./models/vae/Wan2.1_VAE.pth \
285
- --model_type vaew2_1 \
286
- --device cuda \
287
- --dtype bfloat16
288
- ```
289
-
290
- **2. Test Official VAE (Wan2.2)**
291
- ```bash
292
- python -m lightx2v.models.video_encoders.hf.vid_recon \
293
- input_video.mp4 \
294
- --checkpoint ./models/vae/Wan2.2_VAE.pth \
295
- --model_type vaew2_2 \
296
- --device cuda \
297
- --dtype bfloat16
298
- ```
299
-
300
- **3. Test LightTAE (Wan2.1)**
301
- ```bash
302
- python -m lightx2v.models.video_encoders.hf.vid_recon \
303
- input_video.mp4 \
304
- --checkpoint ./models/vae/lighttaew2_1.pth \
305
- --model_type taew2_1 \
306
- --device cuda \
307
- --dtype bfloat16
308
- ```
309
-
310
- **4. Test LightTAE (Wan2.2)**
311
- ```bash
312
- python -m lightx2v.models.video_encoders.hf.vid_recon \
313
- input_video.mp4 \
314
- --checkpoint ./models/vae/lighttaew2_2.pth \
315
- --model_type taew2_2 \
316
- --device cuda \
317
- --dtype bfloat16
318
- ```
319
-
320
- **5. Test LightVAE (Wan2.1)**
321
- ```bash
322
  python -m lightx2v.models.video_encoders.hf.vid_recon \
323
  input_video.mp4 \
324
  --checkpoint ./models/vae/lightvaew2_1.pth \
@@ -328,103 +180,18 @@ python -m lightx2v.models.video_encoders.hf.vid_recon \
328
  --use_lightvae
329
  ```
330
 
 
331
 
332
- **6. Test TAE (Wan2.1)**
333
- ```bash
334
- python -m lightx2v.models.video_encoders.hf.vid_recon \
335
- input_video.mp4 \
336
- --checkpoint ./models/vae/taew2_1.pth \
337
- --model_type taew2_1 \
338
- --device cuda \
339
- --dtype bfloat16
340
- ```
341
-
342
- **7. Test TAE (Wan2.2)**
343
- ```bash
344
- python -m lightx2v.models.video_encoders.hf.vid_recon \
345
- input_video.mp4 \
346
- --checkpoint ./models/vae/taew2_2.pth \
347
- --model_type taew2_1 \
348
- --device cuda \
349
- --dtype bfloat16
350
- ```
351
-
352
- ### Use in LightX2V
353
-
354
- Specify the VAE path in the configuration file:
355
-
356
-
357
- **Using Official VAE Series:**
358
- ```json
359
- {
360
-
361
- "vae_path": "./models/vae/Wan2.1_VAE.pth"
362
- }
363
- ```
364
-
365
- **Using LightVAE Series:**
366
- ```json
367
- {
368
- "use_lightvae": true,
369
- "vae_path": "./models/vae/lightvaew2_1.pth"
370
- }
371
- ```
372
-
373
-
374
- **Using LightTAE Series:**
375
- ```json
376
- {
377
- "use_tae": true,
378
- "need_scaled": true,
379
- "tae_path": "./models/vae/lighttaew2_1.pth"
380
  }
381
  ```
382
 
383
-
384
- **Using TAE Series:**
385
- ```json
386
- {
387
- "use_tae": true,
388
- "tae_path": "./models/vae/taew2_1.pth"
389
- }
390
- ```
391
-
392
- Then run the inference script:
393
-
394
- ```bash
395
- cd LightX2V/scripts
396
- bash wan/run_wan_i2v.sh # or other inference scripts
397
- ```
398
-
399
- ### Use in ComfyUI
400
-
401
- please refer to https://github.com/ModelTC/ComfyUI-LightVAE
402
-
403
- ## ⚠️ Important Notes
404
-
405
- ### 1. Compatibility
406
- - Wan2.1 series VAE only works with Wan2.1 backbone models
407
- - Wan2.2 series VAE only works with Wan2.2 backbone models
408
- - Do not mix different versions of VAE and backbone models
409
-
410
- ## 📚 Related Resources
411
-
412
- ### Documentation Links
413
- - **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html)
414
- - **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)
415
- - **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV)
416
-
417
- ### Related Models
418
- - **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI)
419
- - **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
420
- - **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v)
421
-
422
- ---
423
-
424
  ## 🤝 Community & Support
425
 
426
- - **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues
427
- - **HuggingFace**: https://huggingface.co/lightx2v
428
- - **LightX2V Homepage**: https://github.com/ModelTC/LightX2V
429
-
430
- If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)
 
1
  ---
 
 
 
 
 
 
 
 
2
  base_model:
3
+ - Wan-AI/Wan2.2-I2V-A14B
4
+ - Wan-AI/Wan2.2-TI2V-5B
5
+ - Wan-AI/Wan2.1-I2V-14B-720P
 
 
 
6
  library_name: diffusers
7
+ license: apache-2.0
8
+ tags:
9
+ - diffusion-single-file
10
+ - comfyui
11
+ - distillation
12
+ - LoRA
13
+ - video
14
+ - video generation
15
+ - sparse-attention
16
+ pipeline_tag: text-to-video
17
+ ---
18
+
19
+ # Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
20
+
21
+ This repository contains the weights and artifacts for **Light Forcing**, the first sparse attention solution tailored for autoregressive (AR) video generation models.
22
+
23
+ [![arXiv](https://img.shields.io/badge/arXiv-2602.04789-b31b1b)](https://huggingface.co/papers/2602.04789)
24
+ [![GitHub](https://img.shields.io/badge/GitHub-LightForcing-blue?logo=github)](https://github.com/chengtao-lv/LightForcing)
25
+
26
+ Light Forcing introduces a *Chunk-Aware Growth* mechanism and *Hierarchical Sparse Attention* to capture informative historical and local context. It enables significant end-to-end speedups (e.g., up to 3.0× on an RTX 5090) for models like Wan2.1 and Wan2.2 while maintaining high visual quality.
27
+
28
+ ## 🚀 Quick Start
29
+
30
+ ### Fast Inference
31
+
32
+ To use Light Forcing for video generation, please refer to the official [GitHub repository](https://github.com/chengtao-lv/LightForcing) for environment setup and model weights.
33
+
34
+ **For short-video generation (e.g., 5s):**
35
+
36
+ ```shell
37
+ python inference.py \
38
+ --config_path configs/light_forcing_short.yaml \
39
+ --output_folder videos/light_forcing_short \
40
+ --checkpoint_path path/to/short_video_gen.pt \
41
+ --data_path prompts/MovieGenVideoBench_extended.txt \
42
+ --use_ema
43
+ ```
44
+
45
+ **For long-video generation (e.g., 15s):**
46
+
47
+ ```shell
48
+ python inference.py \
49
+ --config_path configs/light_forcing_long.yaml \
50
+ --output_folder videos/light_forcing_long \
51
+ --checkpoint_path path/to/long_video_gen.pt \
52
+ --data_path prompts/MovieGenVideoBench_extended.txt \
53
+ --use_ema \
54
+ --num_output_frames 63
55
+ ```
56
+
57
  ---
58
+
59
  # 🎨 LightVAE
60
 
61
  ## ⚡ Efficient Video Autoencoder (VAE) Model Collection
 
150
 
151
  ---
152
 
153
+ ## 📊 Performance Comparison
154
 
155
+ ### Video Reconstruction (Wan2.1 Series, 5s 81-frame video)
156
+ - **Precision**: BF16 | **Hardware**: NVIDIA H100
 
 
 
157
 
158
+ | Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
159
  |:-----|:--------------|:------------|:---------------------|:-------------|
160
+ | **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s | 1.5014s |
161
  | **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |
162
 
163
+ | GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
164
  |:-----|:--------------|:------------|:---------------------|:-------------|
165
  | **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
166
  | **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |
167
 
168
+ ## 🧪 VAE Reconstruction Test
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
169
 
170
+ You can test the VAE models independently using the standalone script provided in the repository:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
  ```bash
173
+ # Test LightVAE (Wan2.1)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
174
  python -m lightx2v.models.video_encoders.hf.vid_recon \
175
  input_video.mp4 \
176
  --checkpoint ./models/vae/lightvaew2_1.pth \
 
180
  --use_lightvae
181
  ```
182
 
183
+ ## 📑 Citation
184
 
185
+ ```bibtex
186
+ @article{lv2026light,
187
+ title={Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention},
188
+ author={Lv, Chengtao and Shi, Yumeng and Huang, Yushi and Gong, Ruihao and Ren, Shen and Wang, Wenya},
189
+ journal={arXiv preprint arXiv:2602.04789},
190
+ year={2026}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
191
  }
192
  ```
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ## 🤝 Community & Support
195
 
196
+ - **GitHub Issues**: [ModelTC/LightX2V](https://github.com/ModelTC/LightX2V/issues)
197
+ - **LightX2V Homepage**: [https://github.com/ModelTC/LightX2V](https://github.com/ModelTC/LightX2V)