lightx2v commited on
Commit
2a9908f
Β·
verified Β·
1 Parent(s): 285836c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +309 -0
README.md ADDED
@@ -0,0 +1,309 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - diffusion-single-file
5
+ - comfyui
6
+ - distillation
7
+ - LoRA
8
+ - video
9
+ - video genration
10
+ base_model:
11
+ - Wan-AI/Wan2.2-I2V-A14B
12
+ - Wan-AI/Wan2.2-TI2V-5B
13
+ - Wan-AI/Wan2.1-I2V-14B-720P
14
+ pipeline_tags:
15
+ - image-to-video
16
+ - text-to-video
17
+ library_name: diffusers
18
+ ---
19
+ # 🎨 LightVAE
20
+
21
+ ## ⚑ Efficient Video Autoencoder (VAE) Model Collection
22
+
23
+ *From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory*
24
+
25
+ ---
26
+
27
+ [![πŸ€— HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/lightx2v)
28
+ [![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V)
29
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)
30
+
31
+ ---
32
+
33
+ For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: **LightVAE** and **LightTAE**, which significantly reduce memory consumption and improve inference speed while maintaining high quality.
34
+
35
+ ## πŸ’‘ Core Advantages
36
+
37
+ <table>
38
+ <tr>
39
+ <td width="50%">
40
+
41
+ ### πŸ“Š Official VAE
42
+ **Features**: Highest Quality ⭐⭐⭐⭐⭐
43
+
44
+ βœ… Best reconstruction accuracy
45
+ βœ… Complete detail preservation
46
+ ❌ Large memory usage (~8-12 GB)
47
+ ❌ Slow inference speed
48
+
49
+ </td>
50
+ <td width="50%">
51
+
52
+ ### πŸš€ Open Source TAE Series
53
+ **Features**: Fastest Speed ⚑⚑⚑⚑⚑
54
+
55
+ βœ… Minimal memory usage (~0.4 GB)
56
+ βœ… Extremely fast inference
57
+ ❌ Average quality ⭐⭐⭐
58
+ ❌ Potential detail loss
59
+
60
+ </td>
61
+ </tr>
62
+ <tr>
63
+ <td width="50%">
64
+
65
+ ### 🎯 **LightVAE Series** (Our Optimization)
66
+ **Features**: Best Balanced Solution βš–οΈ
67
+
68
+ βœ… Uses **Causal 3D Conv** (same as official)
69
+ βœ… **High accuracy ceiling** ⭐⭐⭐⭐⭐
70
+ βœ… Memory reduced by **~50%** (~4-5 GB)
71
+ βœ… Speed increased by **2-3x**
72
+ βœ… Balances quality, speed, and memory πŸ†
73
+
74
+ </td>
75
+ <td width="50%">
76
+
77
+ ### ⚑ **LightTAE Series** (Our Optimization)
78
+ **Features**: Fast Speed + Good Quality πŸ†
79
+
80
+ βœ… Minimal memory usage (~0.4 GB)
81
+ βœ… Extremely fast inference
82
+ βœ… **Quality close to official** ⭐⭐⭐⭐
83
+ βœ… **Significantly surpasses open source TAE**
84
+
85
+ </td>
86
+ </tr>
87
+ </table>
88
+
89
+ ---
90
+
91
+ ## πŸ“¦ Available Models
92
+
93
+ ### 🎯 Wan2.1 Series VAE
94
+
95
+ | Model Name | Type | Architecture | Description |
96
+ |:--------|:-----|:-----|:-----|
97
+ | `Wan2.1_VAE` | Official VAE | Causal Conv3D | Wan2.1 official video VAE model<br>**Highest quality, large memory, slow speed** |
98
+ | `taew2_1` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** |
99
+ | **`lighttaew2_1`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_1`**<br>**Small memory, fast speed, quality close to official** ✨ |
100
+ | **`lightvaew2_1`** | **LightVAE Series** | Causal Conv3D | **Our pruned 75% on WanVAE2.1 architecture then trained+distilled**<br>**Best balance: high quality + low memory + fast speed** πŸ† |
101
+
102
+ ### 🎯 Wan2.2 Series VAE
103
+
104
+ | Model Name | Type | Architecture | Description |
105
+ |:--------|:-----|:-----|:-----|
106
+ | `Wan2.2_VAE` | Official VAE | Causal Conv3D | Wan2.2 official video VAE model<br>**Highest quality, large memory, slow speed** |
107
+ | `taew2_2` | Open Source Small AE | Conv2D | Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>**Small memory, fast speed, average quality** |
108
+ | **`lighttaew2_2`** | **LightTAE Series** | Conv2D | **Our distilled optimized version based on `taew2_2`**<br>**Small memory, fast speed, quality close to official** ✨ |
109
+
110
+ ---
111
+
112
+
113
+ ## πŸ“Š Wan2.1 Series Performance Comparison
114
+ - **Precision**: BF16
115
+ - **Test Hardware**: NVIDIA H100
116
+
117
+ ### Video Reconstruction (5s 81-frame video)
118
+
119
+ |Speed | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
120
+ |:-----|:--------------|:------------|:---------------------|:-------------|
121
+ | **Encode Speed** | 4.1721 s | 0.3956 s | 0.3956 s |1.5014s |
122
+ | **Decode Speed** | 5.4649 s | 0.2463 s | 0.2463 s | 2.0697s |
123
+
124
+ |GPU Memory | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
125
+ |:-----|:--------------|:------------|:---------------------|:-------------|
126
+ | **Encode Memory** | 8.4954 GB | 0.00858 GB | 0.00858 GB | 4.7631 GB |
127
+ | **Decode Memory** | 10.1287 GB | 0.41199 GB | 0.41199 GB | 5.5673 GB |
128
+
129
+ ### Video Generation
130
+
131
+ Task: s2v(speech to video)
132
+ Model: seko-talk
133
+
134
+ | Wan2.1_VAE | taew2_1 | lighttaew2_1 | lightvaew2_1 |
135
+ |:--------------|:------------|:---------------------|:-------------|
136
+ | https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4|https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4|
137
+
138
+
139
+ ## πŸ“Š Wan2.2 Series Performance Comparison
140
+ - **Precision**: BF16
141
+ - **Test Hardware**: NVIDIA H100
142
+
143
+ ### Video Reconstruction
144
+ | Speed | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
145
+ |:-----|:--------------|:------------|:---------------------|
146
+ | **Encode Speed** | 1.1369s | 0.3499 s | 0.3499 s |
147
+ | **Decode Speed** | 3.1268 s | 0.0891 s | 0.0891 s|
148
+
149
+ | GPU Memory | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
150
+ |:-----|:--------------|:------------|:---------------------|
151
+ | **Encode Memory** | 6.1991 GB | 0.0064 GB | 0.0064 GB |
152
+ | **Decode Memory** | 12.3487 GB | 0.4120 GB | 0.4120 GB |
153
+
154
+
155
+ ### Video Generation
156
+
157
+ Task: t2v(text to video)
158
+ Model: [Wan-AI/Wan2.1-T2V-A14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-A14B)
159
+
160
+ | Wan2.2_VAE | taew2_2 | lighttaew2_2 |
161
+ |:--------------|:------------|:---------------------|
162
+ | https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4| https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4|
163
+
164
+
165
+
166
+ ## 🎯 Model Selection Recommendations
167
+
168
+ ### Selection by Use Case
169
+
170
+ <table>
171
+ <tr>
172
+ <td width="33%">
173
+
174
+ #### πŸ† Pursuing Best Quality
175
+ **Recommended**: `Wan2.1_VAE` / `Wan2.2_VAE`
176
+
177
+ - βœ… Official model, quality ceiling
178
+ - βœ… Highest reconstruction accuracy
179
+ - βœ… Suitable for final product output
180
+ - ⚠️ **Large memory usage** (~8-12 GB)
181
+ - ⚠️ **Slow inference speed**
182
+
183
+ </td>
184
+ <td width="33%">
185
+
186
+ #### βš–οΈ **Best Balance** πŸ†
187
+ **Recommended**: **`lightvaew2_1`**
188
+
189
+ - βœ… **Uses Causal 3D Conv** (same as official)
190
+ - βœ… **Excellent quality**, close to official
191
+ - βœ… Memory reduced by **~50%** (~4-5 GB)
192
+ - βœ… Speed increased by **2-3x**
193
+ - βœ… **High accuracy ceiling**
194
+
195
+ **Use Cases**: Daily production, strongly recommended ⭐
196
+
197
+ </td>
198
+ <td width="33%">
199
+
200
+ #### ⚑ **Speed + Quality Balance** ✨
201
+ **Recommended**: **`lighttaew2_1`** / **`lighttaew2_2`**
202
+
203
+ - βœ… Extremely low memory usage (~0.4 GB)
204
+ - βœ… Extremely fast inference
205
+ - βœ… **Quality significantly surpasses open source TAE**
206
+ - βœ… **Close to official quality** ⭐⭐⭐⭐
207
+
208
+ **Use Cases**: Development testing, rapid iteration
209
+
210
+ </td>
211
+ </tr>
212
+ </table>
213
+
214
+
215
+ ### πŸ”₯ Our Optimization Results Comparison
216
+
217
+ | Comparison | Open Source TAE | **LightTAE (Ours)** | Official VAE | **LightVAE (Ours)** |
218
+ |:------|:--------|:---------------------|:---------|:---------------------|
219
+ | **Architecture** | Conv2D | Conv2D | Causal Conv3D | Causal Conv3D |
220
+ | **Memory Usage** | Minimal (~0.4 GB) | Minimal (~0.4 GB) | Large (~8-12 GB) | Medium (~4-5 GB) |
221
+ | **Inference Speed** | Extremely Fast ⚑⚑⚑⚑⚑ | Extremely Fast ⚑⚑⚑⚑⚑ | Slow ⚑⚑ | Fast ⚑⚑⚑⚑ |
222
+ | **Generation Quality** | Average ⭐⭐⭐ | **Close to Official** ⭐⭐⭐⭐ | Highest ⭐⭐⭐⭐⭐ | Excellent ⭐⭐⭐⭐⭐ |
223
+ | **Accuracy Ceiling** | Medium | Medium | Highest | **High** |
224
+
225
+ ## πŸš€ Usage
226
+
227
+ ### Download VAE Models
228
+
229
+ ```bash
230
+ # Download Wan2.1 official VAE
231
+ huggingface-cli download lightx2v/Autoencoders-Lightx2v \
232
+ --local-dir ./models/vae/
233
+ ```
234
+
235
+ ### Use in LightX2V
236
+
237
+ Specify the VAE path in the configuration file:
238
+
239
+
240
+ **Using Official VAE Series:**
241
+ ```json
242
+ {
243
+
244
+ "vae_pth": "./models/vae/Wan2.1_VAE.pth"
245
+ }
246
+ ```
247
+
248
+ **Using LightVAE Series:**
249
+ ```json
250
+ {
251
+ "use_lightvae": true,
252
+ "vae_pth": "./models/vae/lightvaew2_1.pth"
253
+ }
254
+ ```
255
+
256
+
257
+ **Using LightTAE Series:**
258
+ ```json
259
+ {
260
+ "use_tiny_vae": true,
261
+ "need_scaled": true,
262
+ "tiny_vae_path": "./models/vae/lighttaew2_1.pth"
263
+ }
264
+ ```
265
+
266
+
267
+ **Using TAE Series:**
268
+ ```json
269
+ {
270
+ "use_tiny_vae": true,
271
+ "tiny_vae_path": "./models/vae/taew2_1.pth"
272
+ }
273
+ ```
274
+
275
+ Then run the inference script:
276
+
277
+ ```bash
278
+ cd LightX2V/scripts
279
+ bash wan/run_wan_i2v.sh # or other inference scripts
280
+ ```
281
+
282
+ ## ⚠️ Important Notes
283
+
284
+ ### 1. Compatibility
285
+ - Wan2.1 series VAE only works with Wan2.1 backbone models
286
+ - Wan2.2 series VAE only works with Wan2.2 backbone models
287
+ - Do not mix different versions of VAE and backbone models
288
+
289
+ ## πŸ“š Related Resources
290
+
291
+ ### Documentation Links
292
+ - **LightX2V Quick Start**: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html)
293
+ - **Model Structure Description**: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)
294
+ - **taeHV Project**: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV)
295
+
296
+ ### Related Models
297
+ - **Wan2.1 Backbone Models**: [Wan-AI Model Collection](https://huggingface.co/Wan-AI)
298
+ - **Wan2.2 Backbone Models**: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
299
+ - **LightX2V Optimized Models**: [lightx2v Model Collection](https://huggingface.co/lightx2v)
300
+
301
+ ---
302
+
303
+ ## 🀝 Community & Support
304
+
305
+ - **GitHub Issues**: https://github.com/ModelTC/LightX2V/issues
306
+ - **HuggingFace**: https://huggingface.co/lightx2v
307
+ - **LightX2V Homepage**: https://github.com/ModelTC/LightX2V
308
+
309
+ If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)