wangkanai commited on
Commit
3045173
·
verified ·
1 Parent(s): e592b83

Add files using upload-large-folder tool

Browse files
Files changed (2) hide show
  1. README.md +286 -0
  2. vae/wan/wan21-vae.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- README Version: v1.0 -->
2
+ ---
3
+ license: other
4
+ license_name: wan-license
5
+ library_name: diffusers
6
+ pipeline_tag: text-to-video
7
+ tags:
8
+ - video-generation
9
+ - vae
10
+ - wan
11
+ - video-compression
12
+ - 3d-causal-vae
13
+ - temporal-causality
14
+ base_model: Wan-AI/Wan2.1-T2V-1.3B
15
+ base_model_relation: adapter
16
+ ---
17
+
18
+ # WAN2.1 VAE - 3D Causal Video Variational Autoencoder
19
+
20
+ WAN2.1 VAE is a novel 3D causal Variational Autoencoder specifically designed for high-quality video generation and compression. This repository contains the standalone VAE component used in the WAN (Open and Advanced Large-Scale Video Generative Models) framework.
21
+
22
+ ## Model Description
23
+
24
+ The WAN2.1 VAE represents a breakthrough in video compression and reconstruction technology, featuring:
25
+
26
+ - **3D Causal Architecture**: Maintains temporal causality across video sequences
27
+ - **Unlimited Length Support**: Can encode and decode unlimited-length 1080P videos without losing historical temporal information
28
+ - **High Compression Efficiency**: Advanced spatio-temporal compression with minimal quality loss
29
+ - **Memory Optimized**: Reduced memory footprint compared to traditional video VAEs
30
+ - **Temporal Information Preservation**: Ensures consistent temporal dynamics across long sequences
31
+
32
+ ### Key Innovations
33
+
34
+ 1. **Improved Spatio-Temporal Compression**: Enhanced compression ratios while maintaining visual fidelity
35
+ 2. **Causal Temporal Processing**: Ensures frame-to-frame causality for coherent video generation
36
+ 3. **Efficient Memory Usage**: Optimized for consumer-grade GPU deployment
37
+ 4. **High-Resolution Support**: Native support for 1080P video encoding/decoding
38
+
39
+ ## Repository Contents
40
+
41
+ ```
42
+ E:\huggingface\wan21-vae\
43
+ └── vae/
44
+ └── wan/
45
+ └── wan21-vae.safetensors (243 MB)
46
+ ```
47
+
48
+ ### Model Files
49
+
50
+ | File | Size | Format | Description |
51
+ |------|------|--------|-------------|
52
+ | `wan21-vae.safetensors` | 243 MB | SafeTensors | WAN2.1 VAE weights |
53
+
54
+ **Total Repository Size**: 243 MB
55
+
56
+ ## Hardware Requirements
57
+
58
+ ### Minimum Requirements
59
+ - **VRAM**: 4 GB (inference only)
60
+ - **RAM**: 8 GB system memory
61
+ - **Disk Space**: 500 MB (including dependencies)
62
+ - **GPU**: CUDA-compatible GPU (NVIDIA GTX 1060 or equivalent)
63
+
64
+ ### Recommended Requirements
65
+ - **VRAM**: 8+ GB for optimal performance
66
+ - **RAM**: 16 GB system memory
67
+ - **Disk Space**: 1 GB
68
+ - **GPU**: NVIDIA RTX 3060 or better
69
+
70
+ ### Resolution-Specific Requirements
71
+ - **480P Video**: 4-6 GB VRAM
72
+ - **720P Video**: 6-8 GB VRAM
73
+ - **1080P Video**: 8-12 GB VRAM
74
+
75
+ ## Usage Examples
76
+
77
+ ### Basic VAE Loading
78
+
79
+ ```python
80
+ import torch
81
+ from diffusers import AutoencoderKL
82
+
83
+ # Load the WAN2.1 VAE
84
+ vae = AutoencoderKL.from_pretrained(
85
+ "E:/huggingface/wan21-vae/vae/wan",
86
+ torch_dtype=torch.float16
87
+ ).to("cuda")
88
+
89
+ print(f"VAE loaded: {vae.config}")
90
+ ```
91
+
92
+ ### Video Encoding Example
93
+
94
+ ```python
95
+ import torch
96
+ from diffusers import AutoencoderKL
97
+ from PIL import Image
98
+ import numpy as np
99
+
100
+ # Load VAE
101
+ vae = AutoencoderKL.from_pretrained(
102
+ "E:/huggingface/wan21-vae/vae/wan",
103
+ torch_dtype=torch.float16
104
+ ).to("cuda")
105
+
106
+ # Prepare video frames (example with dummy data)
107
+ # Shape: [batch, channels, frames, height, width]
108
+ video_frames = torch.randn(1, 3, 16, 480, 720).half().to("cuda")
109
+
110
+ # Encode video to latent space
111
+ with torch.no_grad():
112
+ latents = vae.encode(video_frames).latent_dist.sample()
113
+
114
+ print(f"Latent shape: {latents.shape}")
115
+ print(f"Compression ratio: {np.prod(video_frames.shape) / np.prod(latents.shape):.2f}x")
116
+ ```
117
+
118
+ ### Video Decoding Example
119
+
120
+ ```python
121
+ import torch
122
+ from diffusers import AutoencoderKL
123
+
124
+ # Load VAE
125
+ vae = AutoencoderKL.from_pretrained(
126
+ "E:/huggingface/wan21-vae/vae/wan",
127
+ torch_dtype=torch.float16
128
+ ).to("cuda")
129
+
130
+ # Decode latents back to video frames
131
+ # Assuming you have latents from encoding step
132
+ with torch.no_grad():
133
+ reconstructed_video = vae.decode(latents).sample
134
+
135
+ print(f"Reconstructed video shape: {reconstructed_video.shape}")
136
+ ```
137
+
138
+ ### Integration with WAN Models
139
+
140
+ ```python
141
+ import torch
142
+ from diffusers import DiffusionPipeline, AutoencoderKL
143
+
144
+ # Load custom VAE
145
+ vae = AutoencoderKL.from_pretrained(
146
+ "E:/huggingface/wan21-vae/vae/wan",
147
+ torch_dtype=torch.float16
148
+ )
149
+
150
+ # Load WAN model with custom VAE
151
+ pipe = DiffusionPipeline.from_pretrained(
152
+ "Wan-AI/Wan2.1-T2V-1.3B",
153
+ vae=vae,
154
+ torch_dtype=torch.float16
155
+ ).to("cuda")
156
+
157
+ # Generate video
158
+ prompt = "A serene beach at sunset with waves crashing"
159
+ video = pipe(prompt, num_frames=16, height=480, width=720).frames
160
+
161
+ print(f"Generated video: {len(video)} frames")
162
+ ```
163
+
164
+ ## Model Specifications
165
+
166
+ ### Architecture Details
167
+ - **Type**: 3D Causal Variational Autoencoder
168
+ - **Architecture**: Causal spatio-temporal convolutions
169
+ - **Compression**: Variable compression ratios (4x, 8x, 16x depending on configuration)
170
+ - **Causality**: Temporal causal processing for frame consistency
171
+ - **Latent Dimensions**: Optimized for video generation tasks
172
+
173
+ ### Technical Specifications
174
+ - **Precision**: FP16 (Half precision) recommended
175
+ - **Format**: SafeTensors (secure, efficient loading)
176
+ - **Framework**: PyTorch >= 2.4.0
177
+ - **Library**: Diffusers (Hugging Face)
178
+ - **Temporal Support**: Unlimited frame sequences
179
+ - **Resolution Support**: Up to 1080P native
180
+
181
+ ### Supported Operations
182
+ - Video encoding (frames → latents)
183
+ - Video decoding (latents → frames)
184
+ - Temporal compression
185
+ - Spatial compression
186
+ - Causal frame generation
187
+
188
+ ## Performance Tips and Optimization
189
+
190
+ ### Memory Optimization
191
+ ```python
192
+ # Use gradient checkpointing for lower memory usage
193
+ vae.enable_gradient_checkpointing()
194
+
195
+ # Use CPU offloading for very large videos
196
+ vae.enable_sequential_cpu_offload()
197
+
198
+ # Use attention slicing for reduced VRAM
199
+ vae.enable_attention_slicing(1)
200
+ ```
201
+
202
+ ### Speed Optimization
203
+ ```python
204
+ # Compile model for faster inference (PyTorch 2.0+)
205
+ vae = torch.compile(vae, mode="reduce-overhead")
206
+
207
+ # Use xFormers for efficient attention
208
+ vae.enable_xformers_memory_efficient_attention()
209
+
210
+ # Use half precision for faster inference
211
+ vae = vae.half()
212
+ ```
213
+
214
+ ### Batch Processing
215
+ ```python
216
+ # Process multiple video clips efficiently
217
+ batch_size = 4
218
+ video_clips = torch.randn(batch_size, 3, 16, 480, 720).half().to("cuda")
219
+
220
+ with torch.no_grad():
221
+ latents = vae.encode(video_clips).latent_dist.sample()
222
+ ```
223
+
224
+ ### Resolution Guidelines
225
+ - **480P (854×480)**: Best for real-time applications, lowest VRAM
226
+ - **720P (1280×720)**: Balanced quality and performance
227
+ - **1080P (1920×1080)**: Maximum quality, requires high-end GPU
228
+
229
+ ## License
230
+
231
+ This model is released under a custom WAN license. Please refer to the official WAN repository for detailed licensing terms and usage restrictions.
232
+
233
+ **License Type**: Other (Custom WAN License)
234
+
235
+ ### Usage Restrictions
236
+ - Check official WAN-AI repository for commercial usage terms
237
+ - Attribution required for research and non-commercial use
238
+ - Refer to [WAN-AI Organization](https://huggingface.co/Wan-AI) for updates
239
+
240
+ ## Citation
241
+
242
+ If you use this VAE in your research or applications, please cite the WAN project:
243
+
244
+ ```bibtex
245
+ @misc{wan2025,
246
+ title={WAN: Open and Advanced Large-Scale Video Generative Models},
247
+ author={WAN-AI Team},
248
+ year={2025},
249
+ publisher={Hugging Face},
250
+ howpublished={https://huggingface.co/Wan-AI}
251
+ }
252
+ ```
253
+
254
+ ## Related Resources
255
+
256
+ ### Official Links
257
+ - **WAN Organization**: https://huggingface.co/Wan-AI
258
+ - **WAN2.1 T2V 1.3B Model**: https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B
259
+ - **WAN2.1 T2V 14B Model**: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
260
+ - **WAN2.2 Models**: https://huggingface.co/Wan-AI (Latest versions)
261
+ - **GitHub Repository**: https://github.com/Wan-Video
262
+
263
+ ### Related Models
264
+ - **WAN2.2 VAE**: Latest VAE with 64x compression (4×16×16)
265
+ - **WAN2.1 T2V**: Text-to-video generation models
266
+ - **WAN2.1 I2V**: Image-to-video generation models
267
+ - **WAN2.2 Animate**: Character animation models
268
+
269
+ ### Community & Support
270
+ - Hugging Face WAN-AI discussions
271
+ - GitHub issues and community forums
272
+ - Research papers and technical documentation
273
+
274
+ ## Model Card Contact
275
+
276
+ For questions, issues, or collaboration inquiries:
277
+ - Visit the [WAN-AI Hugging Face Organization](https://huggingface.co/Wan-AI)
278
+ - Check the [official GitHub repository](https://github.com/Wan-Video)
279
+ - Review model-specific documentation on individual model cards
280
+
281
+ ---
282
+
283
+ **Version**: v1.0
284
+ **Last Updated**: 2025-10-13
285
+ **Model Size**: 243 MB
286
+ **Format**: SafeTensors
vae/wan/wan21-vae.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fc39d31359a4b0a64f55876d8ff7fa8d780956ae2cb13463b0223e15148976b
3
+ size 253815318