Add files using upload-large-folder tool

Browse files

Files changed (2) hide show

README.md +286 -0
vae/wan/wan21-vae.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,286 @@

+<!-- README Version: v1.0 -->
+---
+license: other
+license_name: wan-license
+library_name: diffusers
+pipeline_tag: text-to-video
+tags:
+  - video-generation
+  - vae
+  - wan
+  - video-compression
+  - 3d-causal-vae
+  - temporal-causality
+base_model: Wan-AI/Wan2.1-T2V-1.3B
+base_model_relation: adapter
+---
+# WAN2.1 VAE - 3D Causal Video Variational Autoencoder
+WAN2.1 VAE is a novel 3D causal Variational Autoencoder specifically designed for high-quality video generation and compression. This repository contains the standalone VAE component used in the WAN (Open and Advanced Large-Scale Video Generative Models) framework.
+## Model Description
+The WAN2.1 VAE represents a breakthrough in video compression and reconstruction technology, featuring:
+- **3D Causal Architecture**: Maintains temporal causality across video sequences
+- **Unlimited Length Support**: Can encode and decode unlimited-length 1080P videos without losing historical temporal information
+- **High Compression Efficiency**: Advanced spatio-temporal compression with minimal quality loss
+- **Memory Optimized**: Reduced memory footprint compared to traditional video VAEs
+- **Temporal Information Preservation**: Ensures consistent temporal dynamics across long sequences
+### Key Innovations
+1. **Improved Spatio-Temporal Compression**: Enhanced compression ratios while maintaining visual fidelity
+2. **Causal Temporal Processing**: Ensures frame-to-frame causality for coherent video generation
+3. **Efficient Memory Usage**: Optimized for consumer-grade GPU deployment
+4. **High-Resolution Support**: Native support for 1080P video encoding/decoding
+## Repository Contents
+```
+E:\huggingface\wan21-vae\
+└── vae/
+    └── wan/
+        └── wan21-vae.safetensors (243 MB)
+```
+### Model Files
+| File | Size | Format | Description |
+|------|------|--------|-------------|
+| `wan21-vae.safetensors` | 243 MB | SafeTensors | WAN2.1 VAE weights |
+**Total Repository Size**: 243 MB
+## Hardware Requirements
+### Minimum Requirements
+- **VRAM**: 4 GB (inference only)
+- **RAM**: 8 GB system memory
+- **Disk Space**: 500 MB (including dependencies)
+- **GPU**: CUDA-compatible GPU (NVIDIA GTX 1060 or equivalent)
+### Recommended Requirements
+- **VRAM**: 8+ GB for optimal performance
+- **RAM**: 16 GB system memory
+- **Disk Space**: 1 GB
+- **GPU**: NVIDIA RTX 3060 or better
+### Resolution-Specific Requirements
+- **480P Video**: 4-6 GB VRAM
+- **720P Video**: 6-8 GB VRAM
+- **1080P Video**: 8-12 GB VRAM
+## Usage Examples
+### Basic VAE Loading
+```python
+import torch
+from diffusers import AutoencoderKL
+# Load the WAN2.1 VAE
+vae = AutoencoderKL.from_pretrained(
+    "E:/huggingface/wan21-vae/vae/wan",
+    torch_dtype=torch.float16
+).to("cuda")
+print(f"VAE loaded: {vae.config}")
+```
+### Video Encoding Example
+```python
+import torch
+from diffusers import AutoencoderKL
+from PIL import Image
+import numpy as np
+# Load VAE
+vae = AutoencoderKL.from_pretrained(
+    "E:/huggingface/wan21-vae/vae/wan",
+    torch_dtype=torch.float16
+).to("cuda")
+# Prepare video frames (example with dummy data)
+# Shape: [batch, channels, frames, height, width]
+video_frames = torch.randn(1, 3, 16, 480, 720).half().to("cuda")
+# Encode video to latent space
+with torch.no_grad():
+    latents = vae.encode(video_frames).latent_dist.sample()
+print(f"Latent shape: {latents.shape}")
+print(f"Compression ratio: {np.prod(video_frames.shape) / np.prod(latents.shape):.2f}x")
+```
+### Video Decoding Example
+```python
+import torch
+from diffusers import AutoencoderKL
+# Load VAE
+vae = AutoencoderKL.from_pretrained(
+    "E:/huggingface/wan21-vae/vae/wan",
+    torch_dtype=torch.float16
+).to("cuda")
+# Decode latents back to video frames
+# Assuming you have latents from encoding step
+with torch.no_grad():
+    reconstructed_video = vae.decode(latents).sample
+print(f"Reconstructed video shape: {reconstructed_video.shape}")
+```
+### Integration with WAN Models
+```python
+import torch
+from diffusers import DiffusionPipeline, AutoencoderKL
+# Load custom VAE
+vae = AutoencoderKL.from_pretrained(
+    "E:/huggingface/wan21-vae/vae/wan",
+    torch_dtype=torch.float16
+)
+# Load WAN model with custom VAE
+pipe = DiffusionPipeline.from_pretrained(
+    "Wan-AI/Wan2.1-T2V-1.3B",
+    vae=vae,
+    torch_dtype=torch.float16
+).to("cuda")
+# Generate video
+prompt = "A serene beach at sunset with waves crashing"
+video = pipe(prompt, num_frames=16, height=480, width=720).frames
+print(f"Generated video: {len(video)} frames")
+```
+## Model Specifications
+### Architecture Details
+- **Type**: 3D Causal Variational Autoencoder
+- **Architecture**: Causal spatio-temporal convolutions
+- **Compression**: Variable compression ratios (4x, 8x, 16x depending on configuration)
+- **Causality**: Temporal causal processing for frame consistency
+- **Latent Dimensions**: Optimized for video generation tasks
+### Technical Specifications
+- **Precision**: FP16 (Half precision) recommended
+- **Format**: SafeTensors (secure, efficient loading)
+- **Framework**: PyTorch >= 2.4.0
+- **Library**: Diffusers (Hugging Face)
+- **Temporal Support**: Unlimited frame sequences
+- **Resolution Support**: Up to 1080P native
+### Supported Operations
+- Video encoding (frames → latents)
+- Video decoding (latents → frames)
+- Temporal compression
+- Spatial compression
+- Causal frame generation
+## Performance Tips and Optimization
+### Memory Optimization
+```python
+# Use gradient checkpointing for lower memory usage
+vae.enable_gradient_checkpointing()
+# Use CPU offloading for very large videos
+vae.enable_sequential_cpu_offload()
+# Use attention slicing for reduced VRAM
+vae.enable_attention_slicing(1)
+```
+### Speed Optimization
+```python
+# Compile model for faster inference (PyTorch 2.0+)
+vae = torch.compile(vae, mode="reduce-overhead")
+# Use xFormers for efficient attention
+vae.enable_xformers_memory_efficient_attention()
+# Use half precision for faster inference
+vae = vae.half()
+```
+### Batch Processing
+```python
+# Process multiple video clips efficiently
+batch_size = 4
+video_clips = torch.randn(batch_size, 3, 16, 480, 720).half().to("cuda")
+with torch.no_grad():
+    latents = vae.encode(video_clips).latent_dist.sample()
+```
+### Resolution Guidelines
+- **480P (854×480)**: Best for real-time applications, lowest VRAM
+- **720P (1280×720)**: Balanced quality and performance
+- **1080P (1920×1080)**: Maximum quality, requires high-end GPU
+## License
+This model is released under a custom WAN license. Please refer to the official WAN repository for detailed licensing terms and usage restrictions.
+**License Type**: Other (Custom WAN License)
+### Usage Restrictions
+- Check official WAN-AI repository for commercial usage terms
+- Attribution required for research and non-commercial use
+- Refer to [WAN-AI Organization](https://huggingface.co/Wan-AI) for updates
+## Citation
+If you use this VAE in your research or applications, please cite the WAN project:
+```bibtex
+@misc{wan2025,
+  title={WAN: Open and Advanced Large-Scale Video Generative Models},
+  author={WAN-AI Team},
+  year={2025},
+  publisher={Hugging Face},
+  howpublished={https://huggingface.co/Wan-AI}
+}
+```
+## Related Resources
+### Official Links
+- **WAN Organization**: https://huggingface.co/Wan-AI
+- **WAN2.1 T2V 1.3B Model**: https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B
+- **WAN2.1 T2V 14B Model**: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
+- **WAN2.2 Models**: https://huggingface.co/Wan-AI (Latest versions)
+- **GitHub Repository**: https://github.com/Wan-Video
+### Related Models
+- **WAN2.2 VAE**: Latest VAE with 64x compression (4×16×16)
+- **WAN2.1 T2V**: Text-to-video generation models
+- **WAN2.1 I2V**: Image-to-video generation models
+- **WAN2.2 Animate**: Character animation models
+### Community & Support
+- Hugging Face WAN-AI discussions
+- GitHub issues and community forums
+- Research papers and technical documentation
+## Model Card Contact
+For questions, issues, or collaboration inquiries:
+- Visit the [WAN-AI Hugging Face Organization](https://huggingface.co/Wan-AI)
+- Check the [official GitHub repository](https://github.com/Wan-Video)
+- Review model-specific documentation on individual model cards
+---
+**Version**: v1.0
+**Last Updated**: 2025-10-13
+**Model Size**: 243 MB
+**Format**: SafeTensors

vae/wan/wan21-vae.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fc39d31359a4b0a64f55876d8ff7fa8d780956ae2cb13463b0223e15148976b
+size 253815318