| --- |
| license: other |
| license_name: wan-license |
| library_name: diffusers |
| pipeline_tag: image-to-video |
| tags: |
| - wan |
| - image-to-video |
| - video-generation |
| - wan21 |
| - fp16 |
| - 480p |
| - diffusion |
| - 14b-parameters |
| --- |
| |
| <!-- README Version: v1.2 --> |
|
|
| # WAN 2.1 FP16 480p - Image-to-Video Diffusion Model |
|
|
| High-fidelity 480p image-to-video generation model in full FP16 precision (14 billion parameters). Part of the WAN (Wan An) 2.1 model family for transforming static images into dynamic videos. |
|
|
| ## Model Description |
|
|
| WAN 2.1 I2V 480p is a 14-billion parameter transformer-based diffusion model that generates videos from static images. This FP16 variant provides maximum numerical precision and generation quality for research and high-quality video synthesis applications. The 480p resolution offers a balanced approach between quality and computational requirements. |
|
|
| **Key Capabilities**: |
| - Image-to-video generation with temporal coherence |
| - 480p resolution output (balanced quality/performance) |
| - Full FP16 precision (16-bit floating point) |
| - Compatible with camera control LoRAs for cinematic effects |
| - Optimized for research and professional production workflows |
|
|
| ## Repository Contents |
|
|
| ``` |
| wan21-fp16-480p/ |
| └── diffusion_models/ |
| └── wan/ |
| └── wan21-i2v-480p-14b-fp16.safetensors (31.0 GB) |
| ``` |
|
|
| **Total Repository Size**: 31.0 GB |
|
|
| ### Model Files |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `wan21-i2v-480p-14b-fp16.safetensors` | 31.0 GB | WAN 2.1 I2V 480p diffusion model (14B parameters, FP16 precision) | |
|
|
| ## Hardware Requirements |
|
|
| ### Minimum Requirements |
| - **VRAM**: 32 GB (for basic inference) |
| - **System RAM**: 32 GB |
| - **Disk Space**: 31 GB for model file |
| - **GPU**: NVIDIA GPU with FP16 support (RTX 3090, A6000, or better) |
|
|
| ### Recommended Requirements |
| - **VRAM**: 40 GB+ (for optimal performance and batch processing) |
| - **System RAM**: 64 GB |
| - **GPU**: High-end NVIDIA GPU (RTX 4090, A6000, A100) |
| - **Storage**: SSD for faster model loading |
|
|
| ### Performance Notes |
| - FP16 precision requires more VRAM than quantized variants (FP8) |
| - Enable memory optimization techniques for 24GB GPUs (gradient checkpointing, attention slicing) |
| - For production deployment with lower VRAM, consider FP8 quantized variants |
|
|
| ## Usage Examples |
|
|
| ### Basic Image-to-Video Generation |
|
|
| ```python |
| from diffusers import DiffusionPipeline |
| from PIL import Image |
| import torch |
| |
| # Load the WAN 2.1 I2V 480p FP16 model |
| pipe = DiffusionPipeline.from_single_file( |
| "E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors", |
| torch_dtype=torch.float16, |
| use_safetensors=True |
| ) |
| |
| pipe.to("cuda") |
| |
| # Load input image |
| input_image = Image.open("path/to/your/image.jpg") |
| |
| # Generate video from image |
| video = pipe( |
| image=input_image, |
| prompt="smooth camera movement, cinematic lighting", |
| num_frames=24, |
| num_inference_steps=50, |
| guidance_scale=7.5 |
| ).frames[0] |
| |
| # Export video |
| from diffusers.utils import export_to_video |
| export_to_video(video, "output_video.mp4", fps=8) |
| ``` |
|
|
| ### With Memory Optimization (for lower VRAM) |
|
|
| ```python |
| from diffusers import DiffusionPipeline |
| import torch |
| |
| # Load model with memory optimizations |
| pipe = DiffusionPipeline.from_single_file( |
| "E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors", |
| torch_dtype=torch.float16, |
| use_safetensors=True |
| ) |
| |
| # Enable memory-efficient attention |
| pipe.enable_attention_slicing() |
| pipe.enable_vae_slicing() |
| |
| # For even lower VRAM usage |
| pipe.enable_model_cpu_offload() |
| |
| pipe.to("cuda") |
| |
| # Generate video with optimizations |
| video = pipe( |
| image=input_image, |
| prompt="your prompt here", |
| num_frames=16, # Reduce frames for lower memory |
| num_inference_steps=30, # Fewer steps for faster generation |
| guidance_scale=7.5 |
| ).frames[0] |
| ``` |
|
|
| ### With Camera Control LoRAs |
|
|
| ```python |
| from diffusers import DiffusionPipeline |
| from PIL import Image |
| import torch |
| |
| # Load base model |
| pipe = DiffusionPipeline.from_single_file( |
| "E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors", |
| torch_dtype=torch.float16, |
| use_safetensors=True |
| ) |
| |
| pipe.to("cuda") |
| |
| # Load camera control LoRA (requires separate download) |
| # Example: rotation, arc shot, or drone camera movements |
| pipe.load_lora_weights( |
| "path/to/wan21-camera-rotation-rank16-v1.safetensors" |
| ) |
| |
| # Generate with camera control |
| video = pipe( |
| image=input_image, |
| prompt="rotating camera around the subject, cinematic", |
| num_frames=24, |
| num_inference_steps=50, |
| guidance_scale=7.5 |
| ).frames[0] |
| |
| export_to_video(video, "output_rotating.mp4", fps=8) |
| ``` |
|
|
| ## Model Specifications |
|
|
| | Specification | Value | |
| |--------------|-------| |
| | **Architecture** | Transformer-based image-to-video diffusion model | |
| | **Parameters** | 14 billion | |
| | **Precision** | FP16 (16-bit floating point) | |
| | **Resolution** | 480p (video output) | |
| | **Format** | SafeTensors | |
| | **Model Size** | 31.0 GB | |
| | **Task** | Image-to-video generation | |
| | **Library** | diffusers | |
| | **Compatible LoRAs** | WAN 2.1 camera control LoRAs (rotation, arc shot, drone) | |
|
|
| ### Technical Details |
| - **FP16 Format**: 1 sign bit, 5-bit exponent, 10-bit mantissa |
| - **Numerical Range**: ±65,504 (max value) |
| - **Precision**: ~3-4 decimal digits |
| - **Quality**: Full precision without quantization artifacts |
| - **Compatibility**: All modern PyTorch versions with CUDA support |
|
|
| ## Installation |
|
|
| ```bash |
| # Install required dependencies |
| pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 |
| pip install diffusers transformers accelerate safetensors pillow |
| |
| # For video export |
| pip install opencv-python imageio imageio-ffmpeg |
| ``` |
|
|
| ### Requirements |
| - Python 3.8+ |
| - PyTorch 2.0+ |
| - diffusers >= 0.21.0 |
| - transformers |
| - accelerate |
| - safetensors |
| - PIL/Pillow |
| - CUDA 11.8+ (or compatible version) |
|
|
| ## Performance Tips |
|
|
| 1. **Memory Optimization** |
| - Enable `attention_slicing()` and `vae_slicing()` for lower VRAM usage |
| - Use `enable_model_cpu_offload()` for 24GB GPUs |
| - Reduce `num_frames` and `num_inference_steps` for faster generation |
|
|
| 2. **Quality Optimization** |
| - Use `guidance_scale` between 7.0-9.0 for best results |
| - Higher `num_inference_steps` (50-75) improves quality but increases time |
| - Experiment with different sampling schedulers (DDIM, DPM++, Euler) |
|
|
| 3. **Speed Optimization** |
| - Use fewer inference steps (25-30) for faster generation |
| - Reduce frame count for shorter videos |
| - Consider FP8 quantized variants for production deployment |
|
|
| 4. **Prompt Engineering** |
| - Include motion descriptions: "smooth movement", "slow pan", "camera tracking" |
| - Specify lighting: "cinematic lighting", "natural light", "dramatic shadows" |
| - Add quality tokens: "high quality", "detailed", "professional" |
|
|
| ## Version Comparison |
|
|
| ### WAN 2.1 Variants |
|
|
| | Variant | Precision | Size | VRAM | Use Case | |
| |---------|-----------|------|------|----------| |
| | **FP16 480p** (this) | FP16 | 31 GB | 32 GB+ | Research, archival quality | |
| | FP16 720p | FP16 | 31 GB | 40 GB+ | Maximum quality output | |
| | FP8 480p | FP8 | ~16 GB | 18 GB+ | Production, deployment | |
| | FP8 720p | FP8 | ~16 GB | 24 GB+ | Production, high quality | |
|
|
| ### Precision Trade-offs |
|
|
| **FP16 Advantages**: |
| - Maximum generation quality |
| - Full numerical precision |
| - No quantization artifacts |
| - Research standard |
|
|
| **FP16 Disadvantages**: |
| - Higher VRAM requirements (2x vs FP8) |
| - Larger file size (2x vs FP8) |
| - Slower inference on tensor core GPUs |
| - Higher deployment costs |
|
|
| ### When to Use FP16 480p |
| - Research and development |
| - Quality benchmarking |
| - Archival/professional production |
| - GPU with 32GB+ VRAM available |
| - Maximum quality requirements |
|
|
| ### When to Consider Alternatives |
| - **FP8 variants**: Production deployment, VRAM constraints, batch processing |
| - **720p variants**: Higher resolution requirements |
| - **WAN 2.2**: Enhanced camera controls, quality improvements |
|
|
| ## Compatibility |
|
|
| ### Compatible Components |
| - **VAE**: WAN 2.1 VAE (separate download required) |
| - **LoRAs**: WAN 2.1 camera control LoRAs |
| - Camera rotation (rank-16) |
| - Arc shot (rank-16) |
| - Drone shot (rank-16) |
| - **Frameworks**: diffusers, ComfyUI (with appropriate nodes) |
|
|
| ### Camera Control LoRAs |
| This model is compatible with WAN 2.1 camera control LoRAs for cinematic effects: |
| - **Rotation**: Orbital camera movements around subjects |
| - **Arc Shot**: Smooth curved dolly movements |
| - **Drone**: Aerial and elevated perspectives |
|
|
| *Note: LoRAs are not included and must be downloaded separately.* |
|
|
| ## License |
|
|
| This model uses a custom WAN license (`wan-license`). Please review the official WAN license terms before use. This may differ from standard open-source licenses and may include restrictions on commercial use, redistribution, or specific applications. |
|
|
| ## Citation |
|
|
| If you use this model in your research or projects, please cite: |
|
|
| ```bibtex |
| @software{wan21_i2v_480p_fp16, |
| title={WAN 2.1 Image-to-Video 480p FP16}, |
| year={2024}, |
| note={14B parameter image-to-video diffusion model in full FP16 precision}, |
| url={https://huggingface.co/wan21-fp16-480p} |
| } |
| ``` |
|
|
| ## Related Resources |
|
|
| ### WAN Model Family |
| - **WAN 2.1 FP16 720p** - Higher resolution variant (31 GB, 40 GB+ VRAM) |
| - **WAN 2.1 FP8** - Quantized variants for efficient deployment (~50% smaller) |
| - **WAN 2.2** - Enhanced camera controls and quality improvements |
| - **WAN LightX2V** - CFG step distillation adapters for faster generation |
|
|
| ### Additional Components |
| - **WAN 2.1 VAE** - Video variational autoencoder (243 MB, separate download) |
| - **Camera Control LoRAs** - Cinematic camera movement adapters (343 MB each) |
| - **Enhancement LoRAs** - Lighting, face quality, action improvements (WAN 2.2) |
|
|
| ### Documentation |
| - [WAN Official Documentation](https://huggingface.co/docs/diffusers/api/pipelines/wan) |
| - [diffusers Library Documentation](https://huggingface.co/docs/diffusers) |
| - [Camera Control LoRA Guide](https://huggingface.co/wan-models) |
|
|
| ## Troubleshooting |
|
|
| ### Common Issues |
|
|
| **Out of Memory Errors**: |
| ```python |
| # Enable all memory optimizations |
| pipe.enable_attention_slicing() |
| pipe.enable_vae_slicing() |
| pipe.enable_model_cpu_offload() |
| |
| # Reduce generation parameters |
| num_frames=16 # Instead of 24 |
| num_inference_steps=30 # Instead of 50 |
| ``` |
|
|
| **Slow Generation**: |
| - Reduce `num_inference_steps` |
| - Use fewer frames |
| - Disable CPU offload if you have sufficient VRAM |
| - Consider FP8 variants for faster inference |
|
|
| **Quality Issues**: |
| - Increase `num_inference_steps` (50-75) |
| - Adjust `guidance_scale` (try 7.0-9.0) |
| - Improve prompt quality and specificity |
| - Ensure input image is high quality |
|
|
| ## Best Practices |
|
|
| 1. **Image Input**: Use high-quality input images (1024x1024 or higher) |
| 2. **Prompts**: Be specific about motion, lighting, and camera movement |
| 3. **Memory Management**: Monitor VRAM usage and enable optimizations as needed |
| 4. **Experimentation**: Test different schedulers and parameters for your use case |
| 5. **Responsible Use**: Follow ethical AI guidelines and license terms |
|
|
| ## Technical Notes |
|
|
| ### FP16 Precision Benefits |
| - **Numerical Accuracy**: Full 16-bit floating point precision |
| - **Quality**: No quantization artifacts or edge cases |
| - **Compatibility**: Broad GPU and software ecosystem support |
| - **Research Standard**: Industry standard for development and benchmarking |
|
|
| ### VRAM Optimization Techniques |
| ```python |
| # Technique 1: Attention slicing (5-10% VRAM reduction) |
| pipe.enable_attention_slicing() |
| |
| # Technique 2: VAE slicing (additional 5-10% VRAM reduction) |
| pipe.enable_vae_slicing() |
| |
| # Technique 3: Model CPU offload (significant VRAM reduction, slower) |
| pipe.enable_model_cpu_offload() |
| |
| # Technique 4: Sequential CPU offload (maximum VRAM reduction, slowest) |
| pipe.enable_sequential_cpu_offload() |
| ``` |
|
|
| ## Changelog |
|
|
| ### v1.0 (Current) |
| - Initial release of WAN 2.1 I2V 480p FP16 model |
| - 14 billion parameters |
| - Full FP16 precision |
| - 480p resolution output |
| - Compatible with WAN 2.1 camera control LoRAs |
|
|
| --- |
|
|
| **Model Version**: v1.0 |
| **Last Updated**: 2024-08-12 |
| **Maintained By**: WAN Model Team |
|
|
| For questions, issues, or contributions, please refer to the official WAN model repositories and community forums. |
|
|
| --- |
|
|
| ⚠️ **Important**: This is a high-precision model requiring significant computational resources. Ensure your hardware meets the minimum requirements before attempting to load and run this model. For production deployment or resource-constrained environments, consider the FP8 quantized variants. |
|
|