wangkanai
/

wan25-vae

@@ -1,25 +1,18 @@
-<!-- README Version: v1.0 -->
 ---
 license: other
-license_name: wan-license
 library_name: diffusers
 pipeline_tag: text-to-video
 tags:
-  - video-generation
-  - vae
   - wan
-  - autoencoder
-  - latent-space
-  - video-compression
-  - wan2.5
-base_model: Wan-AI/Wan2.5
-base_model_relation: component
 ---
-# WAN25 VAE - Video Autoencoder v1.0
-⚠️ **Repository Status**: This repository is currently a placeholder for WAN 2.5 VAE models. The directory structure is prepared but model files have not yet been downloaded.
 High-performance Variational Autoencoder (VAE) component for the WAN 2.5 (World Anything Now) video generation system. This VAE provides efficient latent space encoding and decoding for video content, enabling high-quality video generation with reduced computational requirements.
@@ -47,29 +40,38 @@ The WAN25-VAE is the next-generation variational autoencoder designed for video
 ### WAN VAE Evolution
-| Version | Compression Ratio | Key Features |
-|---------|------------------|--------------|
-| **WAN 2.1 VAE** | 4×8×8 (temporal×spatial) | Initial 3D causal VAE, efficient 1080P encoding |
-| **WAN 2.2 VAE** | 4×16×16 | Enhanced compression (64x overall), improved quality |
-| **WAN 2.5 VAE** | TBD | Expected: Audio-visual integration, further optimizations |
 ## Repository Contents
 ```
-wan25-vae/
-└── vae/
-    └── wan/
-        └── (Model files pending download)
 ```
 **Current Status**: Directory structure prepared, awaiting model file downloads.
-### Expected File Structure
 | File | Expected Size | Description |
 |------|--------------|-------------|
-| `wan25-vae.safetensors` | ~1.5-2.0 GB | WAN25 VAE model weights in safetensors format |
-| `config.json` | ~1-5 KB | Model configuration and architecture parameters |
 ## Hardware Requirements
@@ -78,28 +80,33 @@ wan25-vae/
 - **System RAM**: 4 GB
 - **Disk Space**: 2.5 GB free space
 - **GPU**: CUDA-compatible GPU (NVIDIA) or compatible accelerator
 ### Recommended Specifications
 - **VRAM**: 6+ GB for comfortable operation with video generation pipeline
 - **System RAM**: 16+ GB
 - **GPU**: NVIDIA RTX 3060 or better, RTX 4060+ recommended
-- **Storage**: SSD for faster model loading
 ### Performance Notes
 - VAE operations are typically memory-bound rather than compute-bound
 - Larger batch sizes require proportionally more VRAM
 - CPU inference is possible but significantly slower (30-50x)
 - WAN 2.5 may include audio processing requiring additional compute
 ## Usage Examples
-### Basic Usage with Diffusers (Placeholder)
 ```python
 import torch
 from diffusers import AutoencoderKL
-# Load the WAN25 VAE (when available)
 vae_path = r"E:\huggingface\wan25-vae\vae\wan"
 vae = AutoencoderKL.from_pretrained(
     vae_path,
@@ -189,6 +196,26 @@ for alpha in np.linspace(0, 1, 24):
 smooth_video = vae.decode(torch.stack(interpolated_latents)).sample
 ```
 ## Model Specifications
 ### Architecture Details (Expected)
@@ -198,6 +225,7 @@ smooth_video = vae.decode(torch.stack(interpolated_latents)).sample
 - **Latent Dimensions**: Compressed spatial resolution with channel expansion
 - **Temporal Processing**: 3D causal convolutions for temporal coherence
 - **Activation Functions**: Mixed (SiLU, tanh for output)
 ### Technical Specifications
 - **Format**: SafeTensors (secure, efficient binary format)
@@ -206,20 +234,23 @@ smooth_video = vae.decode(torch.stack(interpolated_latents)).sample
 - **Parameters**: Estimated ~400-500M parameters (based on WAN 2.2 progression)
 - **Compression Ratio**: Expected improvements over WAN 2.2's 4×16×16
 - **Perceptual Optimization**: Pre-trained perceptual networks for quality preservation
 ### Supported Input Resolutions
 - **Standard**: 480P (854×480), 720P (1280×720), 1080P (1920×1080)
 - **Aspect Ratios**: 16:9, 4:3, 1:1, and custom ratios
 - **Frame Rates**: 24fps, 30fps, 60fps support expected
 ## Performance Tips and Optimization
 ### Memory Optimization
 ```python
 # Enable gradient checkpointing for training (if fine-tuning)
 vae.enable_gradient_checkpointing()
-# Use float16 for inference to reduce VRAM usage
 vae = vae.half()
 # Process frames in batches
@@ -227,9 +258,13 @@ batch_size = 4  # Adjust based on available VRAM
 # Enable CPU offloading for large models
 vae.enable_model_cpu_offload()
 ```
 ### Speed Optimization
 ```python
 # Compile model with torch.compile (PyTorch 2.0+)
 vae = torch.compile(vae, mode="reduce-overhead")
@@ -237,27 +272,36 @@ vae = torch.compile(vae, mode="reduce-overhead")
 # Use channels_last memory format for better performance
 vae = vae.to(memory_format=torch.channels_last)
-# Enable TF32 on Ampere+ GPUs
 torch.backends.cuda.matmul.allow_tf32 = True
 torch.backends.cudnn.allow_tf32 = True
 # Use xFormers for memory-efficient attention
 vae.enable_xformers_memory_efficient_attention()
 ```
 ### Quality vs Speed Trade-offs
-- **High Quality**: Use FP32 precision, larger batch sizes, disable tiling
-- **Balanced**: FP16 precision, moderate batch sizes (4-8 frames)
-- **Fast Inference**: FP16 precision, smaller batches (1-2 frames), enable tiling
-- **Ultra Fast**: BF16 precision, aggressive tiling, model compilation
 ### Best Practices
 - Always use safetensors format for security and compatibility
-- Monitor VRAM usage with `torch.cuda.memory_allocated()`
 - Clear cache between large operations: `torch.cuda.empty_cache()`
 - Use mixed precision training if fine-tuning the VAE
 - Validate reconstruction quality with perceptual metrics (LPIPS, SSIM, PSNR)
 - Consider using video-specific quality metrics (VMAF, VQM)
 ## Getting Started
@@ -265,30 +309,49 @@ vae.enable_xformers_memory_efficient_attention()
 When WAN 2.5 VAE becomes available, download from Hugging Face:
-```bash
-# Using huggingface_hub
 from huggingface_hub import snapshot_download
 snapshot_download(
-    repo_id="Wan-AI/Wan2.5-VAE",  # Check official repo name
-    local_dir="E:/huggingface/wan25-vae/vae/wan",
-    allow_patterns=["*.safetensors", "*.json"]
 )
 ```
-Or use git-lfs:
 ```bash
-cd E:/huggingface/wan25-vae/vae/wan
 git lfs install
 git clone https://huggingface.co/Wan-AI/Wan2.5-VAE .
 ```
 ### Step 2: Install Dependencies
 ```bash
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
-pip install diffusers transformers accelerate xformers safetensors
 ```
 ### Step 3: Verify Installation
@@ -296,14 +359,31 @@ pip install diffusers transformers accelerate xformers safetensors
 ```python
 import torch
 from diffusers import AutoencoderKL
 # Check if model files exist
-import os
 vae_path = r"E:\huggingface\wan25-vae\vae\wan"
-if os.path.exists(os.path.join(vae_path, "config.json")):
-    print("✓ WAN25 VAE model found")
-    vae = AutoencoderKL.from_pretrained(vae_path)
-    print(f"✓ Model loaded successfully with {sum(p.numel() for p in vae.parameters())/1e6:.1f}M parameters")
 else:
     print("✗ WAN25 VAE model not found. Please download first.")
 ```
@@ -321,6 +401,8 @@ For complete license details, refer to the official WAN model repository or lice
 - https://huggingface.co/Wan-AI
 - https://wan.video/
 ## Citation
 If you use this VAE in your research or projects, please cite:
@@ -354,7 +436,7 @@ For the broader WAN 2.5 system:
 - **Hugging Face Organization**: [https://huggingface.co/Wan-AI](https://huggingface.co/Wan-AI)
 - **GitHub Repository**: [https://github.com/Wan-Video](https://github.com/Wan-Video)
 - **Diffusers Documentation**: [https://huggingface.co/docs/diffusers](https://huggingface.co/docs/diffusers)
-- **Model Hub**: [https://huggingface.co/models](https://huggingface.co/models)
 ### Related WAN Models (Local Repository)
 - **WAN 2.1 VAE**: `E:\huggingface\wan21-vae\` - Previous generation VAE
@@ -367,7 +449,7 @@ For the broader WAN 2.5 system:
 - **WAN Community**: Discussions and examples for WAN video generation
 - **Video Generation Papers**: Research on video diffusion and VAE architectures
 - **Optimization Guides**: Tips for efficient video processing with VAEs
-- **ArXiv Paper**: Wan: Open and Advanced Large-Scale Video Generative Models
 ### Compatibility
 - **Required Libraries**: `torch>=2.0.0`, `diffusers>=0.21.0`, `transformers>=4.30.0`
@@ -379,11 +461,11 @@ For the broader WAN 2.5 system:
 For technical issues, questions, or contributions:
-1. **Model Issues**: Report to WAN-AI Hugging Face repository
-2. **Integration Questions**: Consult Diffusers documentation and community
-3. **Performance Optimization**: Check PyTorch performance tuning guides
-4. **Local Setup**: Verify CUDA installation and GPU compatibility
-5. **Community Support**: WAN Discord/Forum (check official website)
 ## Troubleshooting
@@ -391,36 +473,92 @@ For technical issues, questions, or contributions:
 **Model Not Found Error:**
 ```python
-# Ensure model files are downloaded to correct path
 # Expected location: E:\huggingface\wan25-vae\vae\wan\
 ```
 **VRAM Out of Memory:**
 ```python
-# Reduce batch size, enable model CPU offloading
 vae.enable_model_cpu_offload()
-# Use FP16 precision
 vae = vae.half()
 ```
 **Slow Inference Speed:**
 ```python
 # Enable xFormers and model compilation
 vae.enable_xformers_memory_efficient_attention()
-vae = torch.compile(vae)
 ```
 ---
-**Version**: v1.0
-**Last Updated**: 2025-10-13
 **Model Format**: SafeTensors (when available)
 **Repository Status**: Placeholder - Awaiting model download
 **Expected Model Size**: ~1.5-2.0 GB
 ## Changelog
-### v1.0 (Initial Documentation - 2025-10-13)
 - Initial placeholder documentation for WAN25-VAE repository
 - Comprehensive usage examples based on WAN 2.1/2.2 patterns
 - Hardware requirements and optimization guidelines
@@ -435,3 +573,4 @@ vae = torch.compile(vae)
 - Add benchmark results and performance comparisons
 - Include official usage examples from WAN team
 - Document any audio-visual integration features

 ---
 license: other
 library_name: diffusers
 pipeline_tag: text-to-video
 tags:
   - wan
+  - text-to-video
+  - image-generation
 ---
+<!-- README Version: v1.2 -->
+# WAN25 VAE - Video Autoencoder v2.5
+⚠️ **Repository Status**: This repository is currently a placeholder for WAN 2.5 VAE models. The directory structure is prepared (`vae/wan/`) but model files have not yet been downloaded. Total current size: ~18 KB (metadata only).
 High-performance Variational Autoencoder (VAE) component for the WAN 2.5 (World Anything Now) video generation system. This VAE provides efficient latent space encoding and decoding for video content, enabling high-quality video generation with reduced computational requirements.
 ### WAN VAE Evolution
+| Version | Compression Ratio | Key Features | Status |
+|---------|------------------|--------------|--------|
+| **WAN 2.1 VAE** | 4×8×8 (temporal×spatial) | Initial 3D causal VAE, efficient 1080P encoding | Available |
+| **WAN 2.2 VAE** | 4×16×16 | Enhanced compression (64x overall), improved quality | Available |
+| **WAN 2.5 VAE** | TBD | Expected: Audio-visual integration, further optimizations | Pending Release |
 ## Repository Contents
+### Current Directory Structure
 ```
+wan25-vae/                                    # Root directory (18 KB)
+├── README.md                                 # This file (~18 KB)
+├── .cache/                                   # Hugging Face upload cache
+│   └── huggingface/
+│       └── upload/
+│           └── README.md.metadata           # Upload metadata
+└── vae/                                      # VAE model directory (empty)
+    └── wan/                                  # WAN model subdirectory (empty - ready for download)
 ```
 **Current Status**: Directory structure prepared, awaiting model file downloads.
+### Expected Files After Download
 | File | Expected Size | Description |
 |------|--------------|-------------|
+| `vae/wan/diffusion_pytorch_model.safetensors` | ~1.5-2.0 GB | WAN25 VAE model weights in safetensors format |
+| `vae/wan/config.json` | ~1-5 KB | Model configuration and architecture parameters |
+| `vae/wan/README.md` | ~5-10 KB | Official model documentation (optional) |
+**Total Repository Size After Download**: ~1.5-2.0 GB
 ## Hardware Requirements
 - **System RAM**: 4 GB
 - **Disk Space**: 2.5 GB free space
 - **GPU**: CUDA-compatible GPU (NVIDIA) or compatible accelerator
+- **CUDA**: Version 11.8+ or 12.1+
+- **Operating System**: Windows 10/11, Linux (Ubuntu 20.04+), macOS (limited GPU support)
 ### Recommended Specifications
 - **VRAM**: 6+ GB for comfortable operation with video generation pipeline
 - **System RAM**: 16+ GB
 - **GPU**: NVIDIA RTX 3060 or better, RTX 4060+ recommended
+- **Storage**: SSD for faster model loading (NVMe preferred)
+- **CPU**: Modern multi-core processor (Intel i5/AMD Ryzen 5 or better)
 ### Performance Notes
 - VAE operations are typically memory-bound rather than compute-bound
 - Larger batch sizes require proportionally more VRAM
 - CPU inference is possible but significantly slower (30-50x)
 - WAN 2.5 may include audio processing requiring additional compute
+- FP16 precision reduces VRAM usage by ~50% with minimal quality loss
+- Batch processing of frames is more efficient than sequential processing
 ## Usage Examples
+### Basic Usage with Diffusers
 ```python
 import torch
 from diffusers import AutoencoderKL
+# Load the WAN25 VAE from local directory
 vae_path = r"E:\huggingface\wan25-vae\vae\wan"
 vae = AutoencoderKL.from_pretrained(
     vae_path,
 smooth_video = vae.decode(torch.stack(interpolated_latents)).sample
 ```
+### Loading from Absolute Path (Windows)
+```python
+import torch
+from diffusers import AutoencoderKL
+# Explicit absolute path for Windows systems
+vae = AutoencoderKL.from_pretrained(
+    r"E:\huggingface\wan25-vae\vae\wan",
+    torch_dtype=torch.float16,
+    local_files_only=True  # Ensure loading from local directory
+)
+# Alternative: Using forward slashes
+vae = AutoencoderKL.from_pretrained(
+    "E:/huggingface/wan25-vae/vae/wan",
+    torch_dtype=torch.float16
+)
+```
 ## Model Specifications
 ### Architecture Details (Expected)
 - **Latent Dimensions**: Compressed spatial resolution with channel expansion
 - **Temporal Processing**: 3D causal convolutions for temporal coherence
 - **Activation Functions**: Mixed (SiLU, tanh for output)
+- **Normalization**: Group normalization for stable training
 ### Technical Specifications
 - **Format**: SafeTensors (secure, efficient binary format)
 - **Parameters**: Estimated ~400-500M parameters (based on WAN 2.2 progression)
 - **Compression Ratio**: Expected improvements over WAN 2.2's 4×16×16
 - **Perceptual Optimization**: Pre-trained perceptual networks for quality preservation
+- **Model Size**: ~1.5-2.0 GB (FP16 safetensors format)
 ### Supported Input Resolutions
 - **Standard**: 480P (854×480), 720P (1280×720), 1080P (1920×1080)
 - **Aspect Ratios**: 16:9, 4:3, 1:1, and custom ratios
 - **Frame Rates**: 24fps, 30fps, 60fps support expected
+- **Batch Processing**: Supports batch encoding/decoding for efficiency
 ## Performance Tips and Optimization
 ### Memory Optimization
 ```python
 # Enable gradient checkpointing for training (if fine-tuning)
 vae.enable_gradient_checkpointing()
+# Use float16 for inference to reduce VRAM usage (~50% reduction)
 vae = vae.half()
 # Process frames in batches
 # Enable CPU offloading for large models
 vae.enable_model_cpu_offload()
+# Enable sequential CPU offload for lowest VRAM usage
+vae.enable_sequential_cpu_offload()
 ```
 ### Speed Optimization
 ```python
 # Compile model with torch.compile (PyTorch 2.0+)
 vae = torch.compile(vae, mode="reduce-overhead")
 # Use channels_last memory format for better performance
 vae = vae.to(memory_format=torch.channels_last)
+# Enable TF32 on Ampere+ GPUs (RTX 30/40 series)
 torch.backends.cuda.matmul.allow_tf32 = True
 torch.backends.cudnn.allow_tf32 = True
 # Use xFormers for memory-efficient attention
 vae.enable_xformers_memory_efficient_attention()
+# Pre-allocate CUDA memory for stable performance
+torch.cuda.set_per_process_memory_fraction(0.9)
 ```
 ### Quality vs Speed Trade-offs
+| Mode | Precision | Batch Size | VRAM Usage | Speed | Quality |
+|------|-----------|------------|------------|-------|---------|
+| **High Quality** | FP32 | 8-16 frames | ~8-12 GB | Slow | Best |
+| **Balanced** | FP16 | 4-8 frames | ~4-6 GB | Good | Excellent |
+| **Fast Inference** | FP16 | 1-2 frames | ~2-3 GB | Fast | Very Good |
+| **Ultra Fast** | BF16 | 1 frame | ~1.5-2 GB | Very Fast | Good |
 ### Best Practices
 - Always use safetensors format for security and compatibility
+- Monitor VRAM usage with `torch.cuda.memory_allocated()` and `torch.cuda.max_memory_allocated()`
 - Clear cache between large operations: `torch.cuda.empty_cache()`
 - Use mixed precision training if fine-tuning the VAE
 - Validate reconstruction quality with perceptual metrics (LPIPS, SSIM, PSNR)
 - Consider using video-specific quality metrics (VMAF, VQM)
+- Profile code with PyTorch profiler to identify bottlenecks
+- Use `torch.no_grad()` context for all inference operations
 ## Getting Started
 When WAN 2.5 VAE becomes available, download from Hugging Face:
+**Method 1: Using huggingface_hub (Recommended)**
+```python
 from huggingface_hub import snapshot_download
 snapshot_download(
+    repo_id="Wan-AI/Wan2.5-VAE",  # Check official repo name when available
+    local_dir=r"E:\huggingface\wan25-vae\vae\wan",
+    allow_patterns=["*.safetensors", "*.json"],
+    local_dir_use_symlinks=False  # Direct copy for Windows
 )
 ```
+**Method 2: Using git-lfs**
 ```bash
+cd E:\huggingface\wan25-vae\vae\wan
 git lfs install
 git clone https://huggingface.co/Wan-AI/Wan2.5-VAE .
 ```
+**Method 3: Manual Download**
+Visit the Hugging Face repository in your browser and download:
+- `diffusion_pytorch_model.safetensors` (~1.5-2.0 GB)
+- `config.json` (~1-5 KB)
+Place files in: `E:\huggingface\wan25-vae\vae\wan\`
 ### Step 2: Install Dependencies
 ```bash
+# Install PyTorch with CUDA support (Windows/Linux)
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+# Install required libraries
+pip install diffusers transformers accelerate safetensors
+# Optional: Install xFormers for memory-efficient attention
+pip install xformers
+# Optional: Install for better performance
+pip install triton
 ```
 ### Step 3: Verify Installation
 ```python
 import torch
 from diffusers import AutoencoderKL
+import os
 # Check if model files exist
 vae_path = r"E:\huggingface\wan25-vae\vae\wan"
+config_path = os.path.join(vae_path, "config.json")
+model_path = os.path.join(vae_path, "diffusion_pytorch_model.safetensors")
+if os.path.exists(config_path):
+    print("✓ WAN25 VAE config found")
+    if os.path.exists(model_path):
+        print("✓ WAN25 VAE model weights found")
+        vae = AutoencoderKL.from_pretrained(vae_path, torch_dtype=torch.float16)
+        param_count = sum(p.numel() for p in vae.parameters()) / 1e6
+        print(f"✓ Model loaded successfully with {param_count:.1f}M parameters")
+        # Check GPU availability
+        if torch.cuda.is_available():
+            print(f"✓ CUDA available: {torch.cuda.get_device_name(0)}")
+            print(f"✓ CUDA version: {torch.version.cuda}")
+            print(f"✓ Available VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
+        else:
+            print("⚠ CUDA not available - CPU inference will be slow")
+    else:
+        print("✗ Model weights not found. Please download the safetensors file.")
 else:
     print("✗ WAN25 VAE model not found. Please download first.")
 ```
 - https://huggingface.co/Wan-AI
 - https://wan.video/
+**Important**: Always verify the specific license terms for WAN 2.5 VAE when it becomes available, as terms may differ from previous versions.
 ## Citation
 If you use this VAE in your research or projects, please cite:
 - **Hugging Face Organization**: [https://huggingface.co/Wan-AI](https://huggingface.co/Wan-AI)
 - **GitHub Repository**: [https://github.com/Wan-Video](https://github.com/Wan-Video)
 - **Diffusers Documentation**: [https://huggingface.co/docs/diffusers](https://huggingface.co/docs/diffusers)
+- **Model Hub**: [https://huggingface.co/models?pipeline_tag=text-to-video](https://huggingface.co/models?pipeline_tag=text-to-video)
 ### Related WAN Models (Local Repository)
 - **WAN 2.1 VAE**: `E:\huggingface\wan21-vae\` - Previous generation VAE
 - **WAN Community**: Discussions and examples for WAN video generation
 - **Video Generation Papers**: Research on video diffusion and VAE architectures
 - **Optimization Guides**: Tips for efficient video processing with VAEs
+- **ArXiv Paper**: [Wan: Open and Advanced Large-Scale Video Generative Models](https://arxiv.org/search/?query=wan+video+generation)
 ### Compatibility
 - **Required Libraries**: `torch>=2.0.0`, `diffusers>=0.21.0`, `transformers>=4.30.0`
 For technical issues, questions, or contributions:
+1. **Model Issues**: Report to WAN-AI Hugging Face repository issues page
+2. **Integration Questions**: Consult Diffusers documentation and community forums
+3. **Performance Optimization**: Check PyTorch performance tuning guides and profiling tools
+4. **Local Setup**: Verify CUDA installation, GPU compatibility, and driver versions
+5. **Community Support**: WAN Discord/Forum (check official website for links)
 ## Troubleshooting
 **Model Not Found Error:**
 ```python
+# Verify model files are downloaded to correct path
 # Expected location: E:\huggingface\wan25-vae\vae\wan\
+# Required files: config.json, diffusion_pytorch_model.safetensors
+import os
+vae_path = r"E:\huggingface\wan25-vae\vae\wan"
+print("Config exists:", os.path.exists(os.path.join(vae_path, "config.json")))
+print("Model exists:", os.path.exists(os.path.join(vae_path, "diffusion_pytorch_model.safetensors")))
 ```
 **VRAM Out of Memory:**
 ```python
+# Reduce batch size to 1-2 frames
+# Enable model CPU offloading
 vae.enable_model_cpu_offload()
+# Use FP16 precision (50% VRAM reduction)
 vae = vae.half()
+# Process in smaller chunks
+chunk_size = 2  # Reduce if still OOM
+# Clear CUDA cache before processing
+torch.cuda.empty_cache()
 ```
 **Slow Inference Speed:**
 ```python
 # Enable xFormers and model compilation
 vae.enable_xformers_memory_efficient_attention()
+vae = torch.compile(vae, mode="reduce-overhead")
+# Enable TF32 (Ampere+ GPUs)
+torch.backends.cuda.matmul.allow_tf32 = True
+# Verify GPU utilization with nvidia-smi
+```
+**Import Errors:**
+```bash
+# Verify installations
+pip list | grep torch
+pip list | grep diffusers
+# Reinstall if needed
+pip install --upgrade torch torchvision diffusers transformers
+```
+**Poor Quality Reconstructions:**
+```python
+# Use higher precision (FP32 instead of FP16)
+vae = vae.float()
+# Verify scaling factor is applied correctly
+latents = latents * vae.config.scaling_factor  # When encoding
+decoded = vae.decode(latents / vae.config.scaling_factor)  # When decoding
+# Check input normalization (should be [-1, 1] range)
 ```
 ---
+**Version**: v1.2
+**Last Updated**: 2025-10-14
 **Model Format**: SafeTensors (when available)
 **Repository Status**: Placeholder - Awaiting model download
 **Expected Model Size**: ~1.5-2.0 GB
+**Current Size**: ~18 KB (metadata only)
 ## Changelog
+### v1.2 (Updated Documentation - 2025-10-14)
+- Updated README version to v1.2 with comprehensive improvements
+- Added actual directory structure analysis (18 KB placeholder repository)
+- Enhanced hardware requirements with detailed specifications
+- Expanded usage examples with Windows absolute path examples
+- Added detailed model specifications table
+- Improved performance optimization section with comparison table
+- Enhanced troubleshooting section with specific solutions
+- Added verification script with detailed system checks
+- Updated repository contents section with current file listing
+- Improved installation instructions with multiple download methods
+- Added quality vs speed trade-offs comparison table
+- Enhanced best practices with profiling and monitoring recommendations
+### v1.1 (Initial Documentation - 2025-10-13)
 - Initial placeholder documentation for WAN25-VAE repository
 - Comprehensive usage examples based on WAN 2.1/2.2 patterns
 - Hardware requirements and optimization guidelines
 - Add benchmark results and performance comparisons
 - Include official usage examples from WAN team
 - Document any audio-visual integration features
+- Add example outputs and quality comparisons with previous VAE versions