lightx2v
/

Autoencoders

@@ -1,21 +1,61 @@
 ---
-license: apache-2.0
-tags:
-  - diffusion-single-file
-  - comfyui
-  - distillation
-  - LoRA
-  - video
-  - video genration
 base_model:
-  - Wan-AI/Wan2.2-I2V-A14B
-  - Wan-AI/Wan2.2-TI2V-5B
-  - Wan-AI/Wan2.1-I2V-14B-720P
-pipeline_tags:
-  - image-to-video
-  - text-to-video
 library_name: diffusers
 ---
 # 🎨 LightVAE
 ## ⚡ Efficient Video Autoencoder (VAE) Model Collection
@@ -110,215 +150,27 @@ For VAE, the LightX2V team has conducted a series of deep optimizations, derivin
 ---
-##  📊 Wan2.1 Series Performance Comparison
-- **Precision**: BF16
-- **Test Hardware**: NVIDIA H100
-### Video Reconstruction (5s 81-frame video)
-|Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
 |:-----|:--------------|:------------|:---------------------|:-------------|
-| **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s |
 | **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |
-|GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
 |:-----|:--------------|:------------|:---------------------|:-------------|
 | **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
 | **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |
-### Video Generation
-Task: s2v(speech to video)
-Model: seko-talk
-<table>
-<tr>
-<td width="25%" align="center">
-<strong>Wan2.1_VAE</strong><br>
-<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4"></video>
-</td>
-<td width="25%" align="center">
-<strong>taew2_1</strong><br>
-<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4"></video>
-</td>
-<td width="25%" align="center">
-<strong>lighttaew2_1</strong><br>
-<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4"></video>
-</td>
-<td width="25%" align="center">
-<strong>lightvaew2_1</strong><br>
-<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4"></video>
-</td>
-</tr>
-</table>
-##  📊 Wan2.2 Series Performance Comparison
-- **Precision**: BF16
-- **Test Hardware**: NVIDIA H100
-### Video Reconstruction
-| Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
-|:-----|:--------------|:------------|:---------------------|
-| **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s |
-| **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s|
-| GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
-|:-----|:--------------|:------------|:---------------------|
-| **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB |
-| **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB |
-### Video Generation
-Task: t2v(text to video)
-Model: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
-<table>
-<tr>
-<td width="33%" align="center">
-<strong>Wan2.2_VAE</strong><br>
-<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4"></video>
-</td>
-<td width="33%" align="center">
-<strong>taew2_2</strong><br>
-<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4"></video>
-</td>
-<td width="33%" align="center">
-<strong>lighttaew2_2</strong><br>
-<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4"></video>
-</td>
-</tr>
-</table>
-## 🎯 Model Selection Recommendations
-### Selection by Use Case
-<table>
-<tr>
-<td width="33%">
-#### 🏆 Pursuing Best Quality
-**Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE`
-- ✅ Official model, quality ceiling
-- ✅ Highest reconstruction accuracy
-- ✅ Suitable for final product output
-- ⚠️ **Large memory usage** (~8-12 GB)
-- ⚠️ **Slow inference speed**
-</td>
-<td width="33%">
-#### ⚖️ **Best Balance** 🏆
-**Recommended**: **`lightvaew2_1`**
-- ✅ **Uses Causal 3D Conv** (same as official)
-- ✅ **Excellent quality**, close to official
-- ✅ Memory reduced by **~50%** (~4-5 GB)
-- ✅ Speed increased by **2-3x**
-- ✅ **Close to official quality** ⭐⭐⭐⭐
-**Use Cases**: Daily production, strongly recommended ⭐
-</td>
-<td width="33%">
-#### ⚡ **Speed + Quality Balance** ✨
-**Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`**
-- ✅ Extremely low memory usage (~0.4 GB)
-- ✅ Extremely fast inference
-- ✅ **Quality significantly surpasses open source TAE**
-- ✅ **Close to official quality** ⭐⭐⭐⭐
-**Use Cases**: Development testing, rapid iteration
-</td>
-</tr>
-</table>
-### 🔥 Our Optimization Results Comparison
-| Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** |
-|:------|:--------|:---------------------|:---------|:---------------------|
-| **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D |
-| **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) |
-| **Inference Speed** | Extremely Fast ⚡⚡⚡⚡⚡ | Extremely Fast ⚡⚡⚡⚡⚡ | Slow ⚡⚡ | Fast ⚡⚡⚡⚡ |
-| **Generation Quality** | Average ⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | Highest ⭐⭐⭐⭐⭐ |  **Close to Official** ⭐⭐⭐⭐  |
-## 📑 Todo List
-  - [x] LightX2V integration
-  - [x] ComfyUI integration
-  - [ ] Training & Distillation Code
-## 🚀 Usage
-### Download VAE Models
 ```bash
-# Download Wan2.1 official VAE
-huggingface-cli download lightx2v/Autoencoders \
-    --local-dir ./models/vae/
-```
-### 🧪  Video Reconstruction Test
-We provide a standalone script `vid_recon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality.
-**Script Location**: `LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py`
-```bash
-git clone https://github.com/ModelTC/LightX2V.git
-cd LightX2V
-```
-**1. Test Official VAE (Wan2.1)**
-```bash
-python -m lightx2v.models.video_encoders.hf.vid_recon \
-    input_video.mp4 \
-    --checkpoint ./models/vae/Wan2.1_VAE.pth \
-    --model_type vaew2_1 \
-    --device cuda \
-    --dtype bfloat16
-```
-**2. Test Official VAE (Wan2.2)**
-```bash
-python -m lightx2v.models.video_encoders.hf.vid_recon \
-    input_video.mp4 \
-    --checkpoint ./models/vae/Wan2.2_VAE.pth \
-    --model_type vaew2_2 \
-    --device cuda \
-    --dtype bfloat16
-```
-**3. Test LightTAE (Wan2.1)**
-```bash
-python -m lightx2v.models.video_encoders.hf.vid_recon \
-    input_video.mp4 \
-    --checkpoint ./models/vae/lighttaew2_1.pth \
-    --model_type taew2_1 \
-    --device cuda \
-    --dtype bfloat16
-```
-**4. Test LightTAE (Wan2.2)**
-```bash
-python -m lightx2v.models.video_encoders.hf.vid_recon \
-    input_video.mp4 \
-    --checkpoint ./models/vae/lighttaew2_2.pth \
-    --model_type taew2_2 \
-    --device cuda \
-    --dtype bfloat16
-```
-**5. Test LightVAE (Wan2.1)**
-```bash
 python -m lightx2v.models.video_encoders.hf.vid_recon \
     input_video.mp4 \
     --checkpoint ./models/vae/lightvaew2_1.pth \
@@ -328,103 +180,18 @@ python -m lightx2v.models.video_encoders.hf.vid_recon \
     --use_lightvae
 ```
-**6. Test TAE (Wan2.1)**
-```bash
-python -m lightx2v.models.video_encoders.hf.vid_recon \
-    input_video.mp4 \
-    --checkpoint ./models/vae/taew2_1.pth \
-    --model_type taew2_1 \
-    --device cuda \
-    --dtype bfloat16
-```
-**7. Test TAE (Wan2.2)**
-```bash
-python -m lightx2v.models.video_encoders.hf.vid_recon \
-    input_video.mp4 \
-    --checkpoint ./models/vae/taew2_2.pth \
-    --model_type taew2_1 \
-    --device cuda \
-    --dtype bfloat16
-```
-### Use in LightX2V
-Specify the VAE path in the configuration file:
-**Using Official VAE Series:**
-```json
-{
-    "vae_path": "./models/vae/Wan2.1_VAE.pth"
-}
-```
-**Using LightVAE Series:**
-```json
-{
-    "use_lightvae": true,
-    "vae_path": "./models/vae/lightvaew2_1.pth"
-}
-```
-**Using LightTAE Series:**
-```json
-{
-    "use_tae": true,
-    "need_scaled": true,
-    "tae_path": "./models/vae/lighttaew2_1.pth"
 }
 ```
-**Using TAE Series:**
-```json
-{
-    "use_tae": true,
-    "tae_path": "./models/vae/taew2_1.pth"
-}
-```
-Then run the inference script:
-```bash
-cd LightX2V/scripts
-bash wan/run_wan_i2v.sh  # or other inference scripts
-```
-### Use in ComfyUI
-please refer to  https://github.com/ModelTC/ComfyUI-LightVAE
-## ⚠️ Important Notes
-### 1. Compatibility
-- Wan2.1 series VAE only works with Wan2.1 backbone models
-- Wan2.2 series VAE only works with Wan2.2 backbone models
-- Do not mix different versions of VAE and backbone models
-## 📚 Related Resources
-### Documentation Links
-- **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html)
-- **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)
-- **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV)
-### Related Models
-- **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI)
-- **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
-- **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v)
----
 ## 🤝 Community & Support
-- **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues
-- **HuggingFace**: https://huggingface.co/lightx2v
-- **LightX2V Homepage**: https://github.com/ModelTC/LightX2V
-If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)

 ---
 base_model:
+- Wan-AI/Wan2.2-I2V-A14B
+- Wan-AI/Wan2.2-TI2V-5B
+- Wan-AI/Wan2.1-I2V-14B-720P
 library_name: diffusers
+license: apache-2.0
+tags:
+- diffusion-single-file
+- comfyui
+- distillation
+- LoRA
+- video
+- video generation
+- sparse-attention
+pipeline_tag: text-to-video
+---
+# Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention
+This repository contains the weights and artifacts for **Light Forcing**, the first sparse attention solution tailored for autoregressive (AR) video generation models.
+[![arXiv](https://img.shields.io/badge/arXiv-2602.04789-b31b1b)](https://huggingface.co/papers/2602.04789)
+[![GitHub](https://img.shields.io/badge/GitHub-LightForcing-blue?logo=github)](https://github.com/chengtao-lv/LightForcing)
+Light Forcing introduces a *Chunk-Aware Growth* mechanism and *Hierarchical Sparse Attention* to capture informative historical and local context. It enables significant end-to-end speedups (e.g., up to 3.0× on an RTX 5090) for models like Wan2.1 and Wan2.2 while maintaining high visual quality.
+## 🚀 Quick Start
+### Fast Inference
+To use Light Forcing for video generation, please refer to the official [GitHub repository](https://github.com/chengtao-lv/LightForcing) for environment setup and model weights.
+**For short-video generation (e.g., 5s):**
+```shell
+python inference.py \
+  --config_path configs/light_forcing_short.yaml \
+  --output_folder videos/light_forcing_short \
+  --checkpoint_path path/to/short_video_gen.pt \
+  --data_path prompts/MovieGenVideoBench_extended.txt \
+  --use_ema
+```
+**For long-video generation (e.g., 15s):**
+```shell
+python inference.py \
+  --config_path configs/light_forcing_long.yaml \
+  --output_folder videos/light_forcing_long \
+  --checkpoint_path path/to/long_video_gen.pt \
+  --data_path prompts/MovieGenVideoBench_extended.txt \
+  --use_ema \
+  --num_output_frames 63
+```
 ---
 # 🎨 LightVAE
 ## ⚡ Efficient Video Autoencoder (VAE) Model Collection
 ---
+## 📊 Performance Comparison
+### Video Reconstruction (Wan2.1 Series, 5s 81-frame video)
+- **Precision**: BF16 | **Hardware**: NVIDIA H100
+| Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
 |:-----|:--------------|:------------|:---------------------|:-------------|
+| **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s | 1.5014s |
 | **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |
+| GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
 |:-----|:--------------|:------------|:---------------------|:-------------|
 | **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
 | **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |
+## 🧪 VAE Reconstruction Test
+You can test the VAE models independently using the standalone script provided in the repository:
 ```bash
+# Test LightVAE (Wan2.1)
 python -m lightx2v.models.video_encoders.hf.vid_recon \
     input_video.mp4 \
     --checkpoint ./models/vae/lightvaew2_1.pth \
     --use_lightvae
 ```
+## 📑 Citation
+```bibtex
+@article{lv2026light,
+  title={Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention},
+  author={Lv, Chengtao and Shi, Yumeng and Huang, Yushi and Gong, Ruihao and Ren, Shen and Wang, Wenya},
+  journal={arXiv preprint arXiv:2602.04789},
+  year={2026}
 }
 ```
 ## 🤝 Community & Support
+- **GitHub Issues**: [ModelTC/LightX2V](https://github.com/ModelTC/LightX2V/issues)
+- **LightX2V Homepage**: [https://github.com/ModelTC/LightX2V](https://github.com/ModelTC/LightX2V)