wan21-fp16-480p / README.md
wangkanai's picture
Upload folder using huggingface_hub
2212798 verified
---
license: other
license_name: wan-license
library_name: diffusers
pipeline_tag: image-to-video
tags:
- wan
- image-to-video
- video-generation
- wan21
- fp16
- 480p
- diffusion
- 14b-parameters
---
<!-- README Version: v1.2 -->
# WAN 2.1 FP16 480p - Image-to-Video Diffusion Model
High-fidelity 480p image-to-video generation model in full FP16 precision (14 billion parameters). Part of the WAN (Wan An) 2.1 model family for transforming static images into dynamic videos.
## Model Description
WAN 2.1 I2V 480p is a 14-billion parameter transformer-based diffusion model that generates videos from static images. This FP16 variant provides maximum numerical precision and generation quality for research and high-quality video synthesis applications. The 480p resolution offers a balanced approach between quality and computational requirements.
**Key Capabilities**:
- Image-to-video generation with temporal coherence
- 480p resolution output (balanced quality/performance)
- Full FP16 precision (16-bit floating point)
- Compatible with camera control LoRAs for cinematic effects
- Optimized for research and professional production workflows
## Repository Contents
```
wan21-fp16-480p/
└── diffusion_models/
└── wan/
└── wan21-i2v-480p-14b-fp16.safetensors (31.0 GB)
```
**Total Repository Size**: 31.0 GB
### Model Files
| File | Size | Description |
|------|------|-------------|
| `wan21-i2v-480p-14b-fp16.safetensors` | 31.0 GB | WAN 2.1 I2V 480p diffusion model (14B parameters, FP16 precision) |
## Hardware Requirements
### Minimum Requirements
- **VRAM**: 32 GB (for basic inference)
- **System RAM**: 32 GB
- **Disk Space**: 31 GB for model file
- **GPU**: NVIDIA GPU with FP16 support (RTX 3090, A6000, or better)
### Recommended Requirements
- **VRAM**: 40 GB+ (for optimal performance and batch processing)
- **System RAM**: 64 GB
- **GPU**: High-end NVIDIA GPU (RTX 4090, A6000, A100)
- **Storage**: SSD for faster model loading
### Performance Notes
- FP16 precision requires more VRAM than quantized variants (FP8)
- Enable memory optimization techniques for 24GB GPUs (gradient checkpointing, attention slicing)
- For production deployment with lower VRAM, consider FP8 quantized variants
## Usage Examples
### Basic Image-to-Video Generation
```python
from diffusers import DiffusionPipeline
from PIL import Image
import torch
# Load the WAN 2.1 I2V 480p FP16 model
pipe = DiffusionPipeline.from_single_file(
"E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors",
torch_dtype=torch.float16,
use_safetensors=True
)
pipe.to("cuda")
# Load input image
input_image = Image.open("path/to/your/image.jpg")
# Generate video from image
video = pipe(
image=input_image,
prompt="smooth camera movement, cinematic lighting",
num_frames=24,
num_inference_steps=50,
guidance_scale=7.5
).frames[0]
# Export video
from diffusers.utils import export_to_video
export_to_video(video, "output_video.mp4", fps=8)
```
### With Memory Optimization (for lower VRAM)
```python
from diffusers import DiffusionPipeline
import torch
# Load model with memory optimizations
pipe = DiffusionPipeline.from_single_file(
"E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors",
torch_dtype=torch.float16,
use_safetensors=True
)
# Enable memory-efficient attention
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
# For even lower VRAM usage
pipe.enable_model_cpu_offload()
pipe.to("cuda")
# Generate video with optimizations
video = pipe(
image=input_image,
prompt="your prompt here",
num_frames=16, # Reduce frames for lower memory
num_inference_steps=30, # Fewer steps for faster generation
guidance_scale=7.5
).frames[0]
```
### With Camera Control LoRAs
```python
from diffusers import DiffusionPipeline
from PIL import Image
import torch
# Load base model
pipe = DiffusionPipeline.from_single_file(
"E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors",
torch_dtype=torch.float16,
use_safetensors=True
)
pipe.to("cuda")
# Load camera control LoRA (requires separate download)
# Example: rotation, arc shot, or drone camera movements
pipe.load_lora_weights(
"path/to/wan21-camera-rotation-rank16-v1.safetensors"
)
# Generate with camera control
video = pipe(
image=input_image,
prompt="rotating camera around the subject, cinematic",
num_frames=24,
num_inference_steps=50,
guidance_scale=7.5
).frames[0]
export_to_video(video, "output_rotating.mp4", fps=8)
```
## Model Specifications
| Specification | Value |
|--------------|-------|
| **Architecture** | Transformer-based image-to-video diffusion model |
| **Parameters** | 14 billion |
| **Precision** | FP16 (16-bit floating point) |
| **Resolution** | 480p (video output) |
| **Format** | SafeTensors |
| **Model Size** | 31.0 GB |
| **Task** | Image-to-video generation |
| **Library** | diffusers |
| **Compatible LoRAs** | WAN 2.1 camera control LoRAs (rotation, arc shot, drone) |
### Technical Details
- **FP16 Format**: 1 sign bit, 5-bit exponent, 10-bit mantissa
- **Numerical Range**: ±65,504 (max value)
- **Precision**: ~3-4 decimal digits
- **Quality**: Full precision without quantization artifacts
- **Compatibility**: All modern PyTorch versions with CUDA support
## Installation
```bash
# Install required dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers transformers accelerate safetensors pillow
# For video export
pip install opencv-python imageio imageio-ffmpeg
```
### Requirements
- Python 3.8+
- PyTorch 2.0+
- diffusers >= 0.21.0
- transformers
- accelerate
- safetensors
- PIL/Pillow
- CUDA 11.8+ (or compatible version)
## Performance Tips
1. **Memory Optimization**
- Enable `attention_slicing()` and `vae_slicing()` for lower VRAM usage
- Use `enable_model_cpu_offload()` for 24GB GPUs
- Reduce `num_frames` and `num_inference_steps` for faster generation
2. **Quality Optimization**
- Use `guidance_scale` between 7.0-9.0 for best results
- Higher `num_inference_steps` (50-75) improves quality but increases time
- Experiment with different sampling schedulers (DDIM, DPM++, Euler)
3. **Speed Optimization**
- Use fewer inference steps (25-30) for faster generation
- Reduce frame count for shorter videos
- Consider FP8 quantized variants for production deployment
4. **Prompt Engineering**
- Include motion descriptions: "smooth movement", "slow pan", "camera tracking"
- Specify lighting: "cinematic lighting", "natural light", "dramatic shadows"
- Add quality tokens: "high quality", "detailed", "professional"
## Version Comparison
### WAN 2.1 Variants
| Variant | Precision | Size | VRAM | Use Case |
|---------|-----------|------|------|----------|
| **FP16 480p** (this) | FP16 | 31 GB | 32 GB+ | Research, archival quality |
| FP16 720p | FP16 | 31 GB | 40 GB+ | Maximum quality output |
| FP8 480p | FP8 | ~16 GB | 18 GB+ | Production, deployment |
| FP8 720p | FP8 | ~16 GB | 24 GB+ | Production, high quality |
### Precision Trade-offs
**FP16 Advantages**:
- Maximum generation quality
- Full numerical precision
- No quantization artifacts
- Research standard
**FP16 Disadvantages**:
- Higher VRAM requirements (2x vs FP8)
- Larger file size (2x vs FP8)
- Slower inference on tensor core GPUs
- Higher deployment costs
### When to Use FP16 480p
- Research and development
- Quality benchmarking
- Archival/professional production
- GPU with 32GB+ VRAM available
- Maximum quality requirements
### When to Consider Alternatives
- **FP8 variants**: Production deployment, VRAM constraints, batch processing
- **720p variants**: Higher resolution requirements
- **WAN 2.2**: Enhanced camera controls, quality improvements
## Compatibility
### Compatible Components
- **VAE**: WAN 2.1 VAE (separate download required)
- **LoRAs**: WAN 2.1 camera control LoRAs
- Camera rotation (rank-16)
- Arc shot (rank-16)
- Drone shot (rank-16)
- **Frameworks**: diffusers, ComfyUI (with appropriate nodes)
### Camera Control LoRAs
This model is compatible with WAN 2.1 camera control LoRAs for cinematic effects:
- **Rotation**: Orbital camera movements around subjects
- **Arc Shot**: Smooth curved dolly movements
- **Drone**: Aerial and elevated perspectives
*Note: LoRAs are not included and must be downloaded separately.*
## License
This model uses a custom WAN license (`wan-license`). Please review the official WAN license terms before use. This may differ from standard open-source licenses and may include restrictions on commercial use, redistribution, or specific applications.
## Citation
If you use this model in your research or projects, please cite:
```bibtex
@software{wan21_i2v_480p_fp16,
title={WAN 2.1 Image-to-Video 480p FP16},
year={2024},
note={14B parameter image-to-video diffusion model in full FP16 precision},
url={https://huggingface.co/wan21-fp16-480p}
}
```
## Related Resources
### WAN Model Family
- **WAN 2.1 FP16 720p** - Higher resolution variant (31 GB, 40 GB+ VRAM)
- **WAN 2.1 FP8** - Quantized variants for efficient deployment (~50% smaller)
- **WAN 2.2** - Enhanced camera controls and quality improvements
- **WAN LightX2V** - CFG step distillation adapters for faster generation
### Additional Components
- **WAN 2.1 VAE** - Video variational autoencoder (243 MB, separate download)
- **Camera Control LoRAs** - Cinematic camera movement adapters (343 MB each)
- **Enhancement LoRAs** - Lighting, face quality, action improvements (WAN 2.2)
### Documentation
- [WAN Official Documentation](https://huggingface.co/docs/diffusers/api/pipelines/wan)
- [diffusers Library Documentation](https://huggingface.co/docs/diffusers)
- [Camera Control LoRA Guide](https://huggingface.co/wan-models)
## Troubleshooting
### Common Issues
**Out of Memory Errors**:
```python
# Enable all memory optimizations
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
# Reduce generation parameters
num_frames=16 # Instead of 24
num_inference_steps=30 # Instead of 50
```
**Slow Generation**:
- Reduce `num_inference_steps`
- Use fewer frames
- Disable CPU offload if you have sufficient VRAM
- Consider FP8 variants for faster inference
**Quality Issues**:
- Increase `num_inference_steps` (50-75)
- Adjust `guidance_scale` (try 7.0-9.0)
- Improve prompt quality and specificity
- Ensure input image is high quality
## Best Practices
1. **Image Input**: Use high-quality input images (1024x1024 or higher)
2. **Prompts**: Be specific about motion, lighting, and camera movement
3. **Memory Management**: Monitor VRAM usage and enable optimizations as needed
4. **Experimentation**: Test different schedulers and parameters for your use case
5. **Responsible Use**: Follow ethical AI guidelines and license terms
## Technical Notes
### FP16 Precision Benefits
- **Numerical Accuracy**: Full 16-bit floating point precision
- **Quality**: No quantization artifacts or edge cases
- **Compatibility**: Broad GPU and software ecosystem support
- **Research Standard**: Industry standard for development and benchmarking
### VRAM Optimization Techniques
```python
# Technique 1: Attention slicing (5-10% VRAM reduction)
pipe.enable_attention_slicing()
# Technique 2: VAE slicing (additional 5-10% VRAM reduction)
pipe.enable_vae_slicing()
# Technique 3: Model CPU offload (significant VRAM reduction, slower)
pipe.enable_model_cpu_offload()
# Technique 4: Sequential CPU offload (maximum VRAM reduction, slowest)
pipe.enable_sequential_cpu_offload()
```
## Changelog
### v1.0 (Current)
- Initial release of WAN 2.1 I2V 480p FP16 model
- 14 billion parameters
- Full FP16 precision
- 480p resolution output
- Compatible with WAN 2.1 camera control LoRAs
---
**Model Version**: v1.0
**Last Updated**: 2024-08-12
**Maintained By**: WAN Model Team
For questions, issues, or contributions, please refer to the official WAN model repositories and community forums.
---
⚠️ **Important**: This is a high-precision model requiring significant computational resources. Ensure your hardware meets the minimum requirements before attempting to load and run this model. For production deployment or resource-constrained environments, consider the FP8 quantized variants.