lightx2v
/

Autoencoders

+---
+license: apache-2.0
+tags:
+  - diffusion-single-file
+  - comfyui
+  - distillation
+  - LoRA
+  - video
+  - video genration
+base_model:
+  - Wan-AI/Wan2.2-I2V-A14B
+  - Wan-AI/Wan2.2-TI2V-5B
+  - Wan-AI/Wan2.1-I2V-14B-720P
+pipeline_tags:
+  - image-to-video
+  - text-to-video
+library_name: diffusers
+---
+# 🎨 LightVAE
+## ⚡ Efficient Video Autoencoder (VAE) Model Collection
+*From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory*
+---
+[![🤗 HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/lightx2v)
+[![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V)
+[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)
+---
+For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: **LightVAE** and **LightTAE**, which significantly reduce memory consumption and improve inference speed while maintaining high quality.
+## 💡 Core Advantages
+<table>
+<tr>
+<td width="50%">
+### 📊 Official VAE
+**Features**: Highest Quality ⭐⭐⭐⭐⭐
+✅ Best reconstruction accuracy
+✅ Complete detail preservation
+❌ Large memory usage (~8-12 GB)
+❌ Slow inference speed
+</td>
+<td width="50%">
+### 🚀 Open Source TAE Series
+**Features**: Fastest Speed ⚡⚡⚡⚡⚡
+✅ Minimal memory usage (~0.4 GB)
+✅ Extremely fast inference
+❌ Average quality ⭐⭐⭐
+❌ Potential detail loss
+</td>
+</tr>
+<tr>
+<td width="50%">
+### 🎯 **LightVAE Series** (Our Optimization)
+**Features**: Best Balanced Solution ⚖️
+✅ Uses **Causal 3D Conv** (same as official)
+✅ **High accuracy ceiling** ⭐⭐⭐⭐⭐
+✅ Memory reduced by **~50%** (~4-5 GB)
+✅ Speed increased by **2-3x**
+✅ Balances quality, speed, and memory 🏆
+</td>
+<td width="50%">
+### ⚡ **LightTAE Series** (Our Optimization)
+**Features**: Fast Speed + Good Quality 🏆
+✅ Minimal memory usage (~0.4 GB)
+✅ Extremely fast inference
+✅ **Quality close to official** ⭐⭐⭐⭐
+✅ **Significantly surpasses open source TAE**
+</td>
+</tr>
+</table>
+---
+## 📦 Available Models
+### 🎯 Wan2.1 Series VAE
+| Model Name | Type | Architecture | Description |
+|:--------|:-----|:-----|:-----|
+| `Wan2.1_VAE` | Official VAE | Causal Conv3D | Wan2.1 official video VAE model<br>**Highest quality, large memory, slow speed** |
+| `taew2_1` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** |
+| **`lighttaew2_1`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_1`**<br>**Small memory, fast speed, quality close to official** ✨ |
+| **`lightvaew2_1`** | **LightVAE Series** | Causal Conv3D | **Our pruned 75% on WanVAE2.1 architecture then trained+distilled**<br>**Best balance: high quality + low memory + fast speed** 🏆 |
+### 🎯 Wan2.2 Series VAE
+| Model Name | Type | Architecture | Description |
+|:--------|:-----|:-----|:-----|
+| `Wan2.2_VAE` | Official VAE | Causal Conv3D | Wan2.2 official video VAE model<br>**Highest quality, large memory, slow speed** |
+| `taew2_2` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** |
+| **`lighttaew2_2`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_2`**<br>**Small memory, fast speed, quality close to official** ✨ |
+---
+##  📊 Wan2.1 Series Performance Comparison
+- **Precision**: BF16
+- **Test Hardware**: NVIDIA H100
+### Video Reconstruction (5s 81-frame video)
+|Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
+|:-----|:--------------|:------------|:---------------------|:-------------|
+| **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s |
+| **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |
+|GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
+|:-----|:--------------|:------------|:---------------------|:-------------|
+| **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
+| **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |
+### Video Generation
+Task: s2v(speech to video)
+Model: seko-talk
+| Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
+|:--------------|:------------|:---------------------|:-------------|
+|  https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4|https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4|
+##  📊 Wan2.2 Series Performance Comparison
+- **Precision**: BF16
+- **Test Hardware**: NVIDIA H100
+### Video Reconstruction
+| Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
+|:-----|:--------------|:------------|:---------------------|
+| **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s |
+| **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s|
+| GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
+|:-----|:--------------|:------------|:---------------------|
+| **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB |
+| **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB |
+### Video Generation
+Task: t2v(text to video)
+Model: [Wan-AI/Wan2.1-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-A14B)
+| Wan2.2_VAE | taew2_2 | lighttaew2_2 |
+|:--------------|:------------|:---------------------|
+| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4|
+## 🎯 Model Selection Recommendations
+### Selection by Use Case
+<table>
+<tr>
+<td width="33%">
+#### 🏆 Pursuing Best Quality
+**Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE`
+- ✅ Official model, quality ceiling
+- ✅ Highest reconstruction accuracy
+- ✅ Suitable for final product output
+- ⚠️ **Large memory usage** (~8-12 GB)
+- ⚠️ **Slow inference speed**
+</td>
+<td width="33%">
+#### ⚖️ **Best Balance** 🏆
+**Recommended**: **`lightvaew2_1`**
+- ✅ **Uses Causal 3D Conv** (same as official)
+- ✅ **Excellent quality**, close to official
+- ✅ Memory reduced by **~50%** (~4-5 GB)
+- ✅ Speed increased by **2-3x**
+- ✅ **High accuracy ceiling**
+**Use Cases**: Daily production, strongly recommended ⭐
+</td>
+<td width="33%">
+#### ⚡ **Speed + Quality Balance** ✨
+**Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`**
+- ✅ Extremely low memory usage (~0.4 GB)
+- ✅ Extremely fast inference
+- ✅ **Quality significantly surpasses open source TAE**
+- ✅ **Close to official quality** ⭐⭐⭐⭐
+**Use Cases**: Development testing, rapid iteration
+</td>
+</tr>
+</table>
+### 🔥 Our Optimization Results Comparison
+| Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** |
+|:------|:--------|:---------------------|:---------|:---------------------|
+| **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D |
+| **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) |
+| **Inference Speed** | Extremely Fast ⚡⚡⚡⚡⚡ | Extremely Fast ⚡⚡⚡⚡⚡ | Slow ⚡⚡ | Fast ⚡⚡⚡⚡ |
+| **Generation Quality** | Average ⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | Highest ⭐⭐⭐⭐⭐ | Excellent ⭐⭐⭐⭐⭐ |
+| **Accuracy Ceiling** | Medium | Medium | Highest | **High** |
+## 🚀 Usage
+### Download VAE Models
+```bash
+# Download Wan2.1 official VAE
+huggingface-cli download lightx2v/Autoencoders-Lightx2v \
+    --local-dir ./models/vae/
+```
+### Use in LightX2V
+Specify the VAE path in the configuration file:
+**Using Official VAE Series:**
+```json
+{
+    "vae_pth": "./models/vae/Wan2.1_VAE.pth"
+}
+```
+**Using LightVAE Series:**
+```json
+{
+    "use_lightvae": true,
+    "vae_pth": "./models/vae/lightvaew2_1.pth"
+}
+```
+**Using LightTAE Series:**
+```json
+{
+    "use_tiny_vae": true,
+    "need_scaled": true,
+    "tiny_vae_path": "./models/vae/lighttaew2_1.pth"
+}
+```
+**Using TAE Series:**
+```json
+{
+    "use_tiny_vae": true,
+    "tiny_vae_path": "./models/vae/taew2_1.pth"
+}
+```
+Then run the inference script:
+```bash
+cd LightX2V/scripts
+bash wan/run_wan_i2v.sh  # or other inference scripts
+```
+## ⚠️ Important Notes
+### 1. Compatibility
+- Wan2.1 series VAE only works with Wan2.1 backbone models
+- Wan2.2 series VAE only works with Wan2.2 backbone models
+- Do not mix different versions of VAE and backbone models
+## 📚 Related Resources
+### Documentation Links
+- **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html)
+- **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)
+- **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV)
+### Related Models
+- **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI)
+- **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
+- **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v)
+---
+## 🤝 Community & Support
+- **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues
+- **HuggingFace**: https://huggingface.co/lightx2v
+- **LightX2V Homepage**: https://github.com/ModelTC/LightX2V
+If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)