File size: 12,979 Bytes

---
license: apache-2.0
tags:
  - diffusion-single-file
  - comfyui
  - distillation
  - LoRA
  - video
  - video genration
base_model:
  - Wan-AI/Wan2.2-I2V-A14B
  - Wan-AI/Wan2.2-TI2V-5B
  - Wan-AI/Wan2.1-I2V-14B-720P
pipeline_tags: 
  - image-to-video
  - text-to-video
library_name: diffusers
---
# 🎨 LightVAE

## ⚡ Efficient Video Autoencoder (VAE) Model Collection

*From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory*
![img_lightx2v](https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/tTnp8-ARpj3wGxfo5P55c.png)

---

[![🤗 HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/lightx2v)
[![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V)
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)

---

For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: **LightVAE** and **LightTAE**, which significantly reduce memory consumption and improve inference speed while maintaining high quality.

## 💡 Core Advantages

<table>
<tr>
<td width="50%">

### 📊 Official VAE
**Features**: Highest Quality ⭐⭐⭐⭐⭐

✅ Best reconstruction accuracy  
✅ Complete detail preservation  
❌ Large memory usage (~8-12 GB)  
❌ Slow inference speed

</td>
<td width="50%">

### 🚀 Open Source TAE Series
**Features**: Fastest Speed ⚡⚡⚡⚡⚡

✅ Minimal memory usage (~0.4 GB)  
✅ Extremely fast inference  
❌ Average quality ⭐⭐⭐  
❌ Potential detail loss

</td>
</tr>
<tr>
<td width="50%">

### 🎯 **LightVAE Series** (Our Optimization)
**Features**: Best Balanced Solution ⚖️

✅ Uses **Causal 3D Conv** (same as official)  
✅ **Quality close to official** ⭐⭐⭐⭐  
✅ Memory reduced by **~50%** (~4-5 GB)  
✅ Speed increased by **2-3x**  
✅ Balances quality, speed, and memory 🏆

</td>
<td width="50%">

### ⚡ **LightTAE Series** (Our Optimization)
**Features**: Fast Speed + Good Quality 🏆

✅ Minimal memory usage (~0.4 GB)  
✅ Extremely fast inference  
✅ **Quality close to official** ⭐⭐⭐⭐  
✅ **Significantly surpasses open source TAE**

</td>
</tr>
</table>

---

## 📦 Available Models

### 🎯 Wan2.1 Series VAE

| Model Name | Type | Architecture | Description | 
|:--------|:-----|:-----|:-----|
| `Wan2.1_VAE` | Official VAE | Causal Conv3D | Wan2.1 official video VAE model<br>**Highest quality, large memory, slow speed** |
| `taew2_1` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** |
| **`lighttaew2_1`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_1`**<br>**Small memory, fast speed, quality close to official** ✨ |
| **`lightvaew2_1`** | **LightVAE Series** | Causal Conv3D | **Our pruned 75% on WanVAE2.1 architecture then trained+distilled**<br>**Best balance: high quality + low memory + fast speed** 🏆 |

### 🎯 Wan2.2 Series VAE

| Model Name | Type | Architecture | Description | 
|:--------|:-----|:-----|:-----|
| `Wan2.2_VAE` | Official VAE | Causal Conv3D | Wan2.2 official video VAE model<br>**Highest quality, large memory, slow speed** |
| `taew2_2` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** |
| **`lighttaew2_2`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_2`**<br>**Small memory, fast speed, quality close to official** ✨ |

---


##  📊 Wan2.1 Series Performance Comparison
- **Precision**: BF16
- **Test Hardware**: NVIDIA H100

### Video Reconstruction (5s 81-frame video)

|Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
|:-----|:--------------|:------------|:---------------------|:-------------|
| **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s |
| **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |

|GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
|:-----|:--------------|:------------|:---------------------|:-------------|
| **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
| **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |

### Video Generation

Task: s2v(speech to video)  
Model: seko-talk

<table>
<tr>
<td width="25%" align="center">
<strong>Wan2.1_VAE</strong><br>
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4"></video>
</td>
<td width="25%" align="center">
<strong>taew2_1</strong><br>
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4"></video>
</td>
<td width="25%" align="center">
<strong>lighttaew2_1</strong><br>
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4"></video>
</td>
<td width="25%" align="center">
<strong>lightvaew2_1</strong><br>
<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4"></video>
</td>
</tr>
</table>

##  📊 Wan2.2 Series Performance Comparison
- **Precision**: BF16
- **Test Hardware**: NVIDIA H100

### Video Reconstruction
| Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
|:-----|:--------------|:------------|:---------------------|
| **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s |
| **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s|

| GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
|:-----|:--------------|:------------|:---------------------|
| **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB |
| **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB |


### Video Generation

Task: t2v(text to video)  
Model: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)

<table>
<tr>
<td width="33%" align="center">
<strong>Wan2.2_VAE</strong><br>
<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4"></video>
</td>
<td width="33%" align="center">
<strong>taew2_2</strong><br>
<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4"></video>
</td>
<td width="33%" align="center">
<strong>lighttaew2_2</strong><br>
<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4"></video>
</td>
</tr>
</table>



## 🎯 Model Selection Recommendations

### Selection by Use Case

<table>
<tr>
<td width="33%">

#### 🏆 Pursuing Best Quality
**Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE`

- ✅ Official model, quality ceiling
- ✅ Highest reconstruction accuracy
- ✅ Suitable for final product output
- ⚠️ **Large memory usage** (~8-12 GB)
- ⚠️ **Slow inference speed**

</td>
<td width="33%">

#### ⚖️ **Best Balance** 🏆
**Recommended**: **`lightvaew2_1`** 

- ✅ **Uses Causal 3D Conv** (same as official)
- ✅ **Excellent quality**, close to official
- ✅ Memory reduced by **~50%** (~4-5 GB)
- ✅ Speed increased by **2-3x**
- ✅ **Close to official quality** ⭐⭐⭐⭐

**Use Cases**: Daily production, strongly recommended ⭐

</td>
<td width="33%">

#### ⚡ **Speed + Quality Balance** ✨
**Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`**

- ✅ Extremely low memory usage (~0.4 GB)
- ✅ Extremely fast inference
- ✅ **Quality significantly surpasses open source TAE**
- ✅ **Close to official quality** ⭐⭐⭐⭐

**Use Cases**: Development testing, rapid iteration

</td>
</tr>
</table>


### 🔥 Our Optimization Results Comparison

| Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** |
|:------|:--------|:---------------------|:---------|:---------------------|
| **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D |
| **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) |
| **Inference Speed** | Extremely Fast ⚡⚡⚡⚡⚡ | Extremely Fast ⚡⚡⚡⚡⚡ | Slow ⚡⚡ | Fast ⚡⚡⚡⚡ |
| **Generation Quality** | Average ⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | Highest ⭐⭐⭐⭐⭐ |  **Close to Official** ⭐⭐⭐⭐  |

## 📑 Todo List
  - [x] LightX2V integration
  - [x] ComfyUI integration
  - [ ] Training & Distillation Code
  
## 🚀 Usage

### Download VAE Models

```bash
# Download Wan2.1 official VAE
huggingface-cli download lightx2v/Autoencoders \
    --local-dir ./models/vae/
```

### 🧪  Video Reconstruction Test

We provide a standalone script `vid_recon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality.

**Script Location**: `LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py`

```bash
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
```

**1. Test Official VAE (Wan2.1)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/Wan2.1_VAE.pth \
    --model_type vaew2_1 \
    --device cuda \
    --dtype bfloat16
```

**2. Test Official VAE (Wan2.2)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/Wan2.2_VAE.pth \
    --model_type vaew2_2 \
    --device cuda \
    --dtype bfloat16
```

**3. Test LightTAE (Wan2.1)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lighttaew2_1.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16
```

**4. Test LightTAE (Wan2.2)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lighttaew2_2.pth \
    --model_type taew2_2 \
    --device cuda \
    --dtype bfloat16
```

**5. Test LightVAE (Wan2.1)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/lightvaew2_1.pth \
    --model_type vaew2_1 \
    --device cuda \
    --dtype bfloat16 \
    --use_lightvae
```


**6. Test TAE (Wan2.1)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/taew2_1.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16
```

**7. Test TAE (Wan2.2)**
```bash
python -m lightx2v.models.video_encoders.hf.vid_recon \
    input_video.mp4 \
    --checkpoint ./models/vae/taew2_2.pth \
    --model_type taew2_1 \
    --device cuda \
    --dtype bfloat16
```

### Use in LightX2V

Specify the VAE path in the configuration file:


**Using Official VAE Series:**
```json
{

    "vae_path": "./models/vae/Wan2.1_VAE.pth"
}
```

**Using LightVAE Series:**
```json
{
    "use_lightvae": true,
    "vae_path": "./models/vae/lightvaew2_1.pth"
}
```


**Using LightTAE Series:**
```json
{
    "use_tae": true,
    "need_scaled": true,
    "tae_path": "./models/vae/lighttaew2_1.pth"
}
```


**Using TAE Series:**
```json
{
    "use_tae": true,
    "tae_path": "./models/vae/taew2_1.pth"
}
```

Then run the inference script:

```bash
cd LightX2V/scripts
bash wan/run_wan_i2v.sh  # or other inference scripts
```

### Use in ComfyUI

please refer to  https://github.com/ModelTC/ComfyUI-LightVAE

## ⚠️ Important Notes

### 1. Compatibility
- Wan2.1 series VAE only works with Wan2.1 backbone models
- Wan2.2 series VAE only works with Wan2.2 backbone models
- Do not mix different versions of VAE and backbone models

## 📚 Related Resources

### Documentation Links
- **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html)
- **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)
- **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV)

### Related Models
- **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI)
- **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
- **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v)

---

## 🤝 Community & Support

- **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues
- **HuggingFace**: https://huggingface.co/lightx2v
- **LightX2V Homepage**: https://github.com/ModelTC/LightX2V

If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)