WAN 2.1 I2V 720p FP8 - High-Resolution Image-to-Video Model
This repository contains the WAN (Wan An) 2.1 Image-to-Video 720p model in FP8 E4M3FN precision for high-resolution video generation from static images. The FP8 quantization provides approximately 50% memory savings compared to FP16 while maintaining high-quality video output.
Model Description
WAN 2.1 I2V 720p FP8 is a 14-billion parameter transformer-based image-to-video model optimized for generating 720p resolution videos from input images. The FP8 E4M3FN quantization format reduces model size and VRAM requirements while preserving generation quality, making it suitable for deployment on consumer GPUs with 24GB+ VRAM.
Key Capabilities:
- Generate 720p resolution videos from static images
- Support for camera control LoRAs (rotation, arc shots, drone perspectives)
- FP8 quantization for efficient inference (~40% VRAM savings vs FP16)
- Compatible with diffusers library and standard image-to-video workflows
Repository Contents
Total Repository Size: ~16 GB
Model Files
diffusion_models/wan/
โโโ wan21-i2v-720p-14b-fp8-e4m3fn.safetensors 16 GB
Diffusion Model:
- File:
diffusion_models/wan/wan21-i2v-720p-14b-fp8-e4m3fn.safetensors - Size: 16 GB
- Precision: FP8 E4M3FN (8-bit floating point)
- Resolution: 720p video generation
- Parameters: 14 billion
- Architecture: Transformer-based image-to-video diffusion model
- Format: SafeTensors (secure, efficient)
Hardware Requirements
- VRAM: 24GB+ recommended for 720p generation
- Minimum: 20GB with optimizations (gradient checkpointing, attention slicing)
- Recommended: RTX 4090 (24GB), RTX A5000 (24GB), or higher
- Disk Space: 16 GB for model file
- System RAM: 32GB+ recommended for optimal performance
- GPU: NVIDIA GPU with FP8 tensor support preferred
- Best performance: Ada Lovelace (RTX 40 series) or Hopper architecture
- Compatible: Ampere (RTX 30 series) with automatic FP16 fallback
- Operating System: Windows 10/11, Linux (Ubuntu 20.04+)
FP8 Performance Benefits
- VRAM Usage: ~40% reduction compared to FP16 variant
- Inference Speed: Up to 1.5-2x faster on FP8-capable GPUs (RTX 40 series)
- Model Size: 50% smaller than FP16 (16GB vs 32GB)
- Quality: >95% generation quality preservation vs FP16
Usage
Basic Image-to-Video Generation
from diffusers import DiffusionPipeline
import torch
from PIL import Image
# Load the WAN 2.1 I2V 720p FP8 model
pipe = DiffusionPipeline.from_single_file(
"E:/huggingface/wan21-fp8-720p/diffusion_models/wan/wan21-i2v-720p-14b-fp8-e4m3fn.safetensors",
torch_dtype=torch.float8_e4m3fn, # FP8 precision
use_safetensors=True
)
pipe.to("cuda")
# Load input image
input_image = Image.open("path/to/your/image.jpg")
# Generate 720p video from image
video_frames = pipe(
image=input_image,
prompt="smooth camera movement, cinematic lighting",
num_frames=24,
num_inference_steps=50,
guidance_scale=7.5,
height=720,
width=1280
).frames[0]
# Save video
from diffusers.utils import export_to_video
export_to_video(video_frames, "output_720p.mp4", fps=8)
Memory-Optimized Generation
# Enable memory optimizations for 20GB VRAM GPUs
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
# Optional: Enable gradient checkpointing (slower but uses less memory)
# pipe.unet.enable_gradient_checkpointing()
# Generate with reduced memory footprint
video_frames = pipe(
image=input_image,
prompt="your prompt here",
num_frames=16, # Reduce frames for lower memory usage
num_inference_steps=30, # Reduce steps for faster generation
guidance_scale=7.5
).frames[0]
Integration with Camera Control LoRAs
This model is compatible with WAN 2.1 camera control LoRAs (sold/distributed separately):
# Load camera control LoRA (example - requires separate LoRA file)
pipe.load_lora_weights(
"path/to/wan21-camera-rotation-rank16-v1.safetensors"
)
# Generate with camera control
video_frames = pipe(
image=input_image,
prompt="rotating camera around the subject, 720p quality",
num_frames=24,
num_inference_steps=50,
guidance_scale=7.5
).frames[0]
Compatible LoRAs:
wan21-camera-rotation-rank16-v1.safetensors- Orbital camera movementswan21-camera-arcshot-rank16-v1.safetensors- Curved dolly movementswan21-camera-drone-rank16-v1.safetensors- Aerial perspectives
Model Specifications
| Specification | Value |
|---|---|
| Model Name | WAN 2.1 I2V 720p FP8 |
| Parameters | 14 billion |
| Architecture | Transformer-based I2V diffusion model |
| Precision | FP8 E4M3FN (8-bit floating point) |
| Resolution | 720p (1280x720) |
| Format | SafeTensors |
| Task | Image-to-Video Generation |
| Library | diffusers |
| Quantization | Post-training quantization (PTQ) |
| Quality Loss | <5% compared to FP16 |
Performance Tips
- GPU Selection: Best performance on RTX 4090, RTX 4080, or newer GPUs with native FP8 support
- Memory Optimization: Enable attention slicing and VAE slicing for 20-22GB VRAM GPUs
- Batch Size: Generate single videos sequentially to avoid VRAM exhaustion
- Frame Count: Start with 16-24 frames, increase if VRAM permits
- Inference Steps: 30-50 steps provide good quality; higher steps improve quality marginally
- Guidance Scale: 7.0-8.5 works well; adjust based on prompt strength needed
- Mixed Precision: Model automatically falls back to FP16 on non-FP8 GPUs (VRAM usage increases)
- Resolution: This is the 720p variant - use 480p model for lower VRAM requirements
Installation
Prerequisites
# Python 3.8 or higher
python --version
# CUDA 11.8 or higher (for NVIDIA GPUs)
nvcc --version
Install Dependencies
# Install PyTorch with CUDA support (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install diffusers and dependencies
pip install diffusers transformers accelerate safetensors
# Optional: Install xformers for memory-efficient attention
pip install xformers
Verify Installation
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
Comparison with Other Variants
WAN 2.1 I2V 480p FP8 vs 720p FP8 (This Model)
| Feature | 480p FP8 | 720p FP8 (This Model) |
|---|---|---|
| Resolution | 854x480 | 1280x720 |
| Model Size | 16 GB | 16 GB |
| VRAM Requirement | 18GB+ | 24GB+ |
| Quality | Good | High |
| Speed | Faster | Moderate |
| Use Case | Fast iteration, previews | Final output, production |
FP8 vs FP16 (720p Models)
| Feature | FP8 (This Model) | FP16 |
|---|---|---|
| Model Size | 16 GB | 32 GB |
| VRAM Usage | ~18-22 GB | ~30-38 GB |
| Inference Speed | Faster (1.5-2x on RTX 40 series) | Baseline |
| Quality | >95% of FP16 | 100% (reference) |
| GPU Compatibility | RTX 40 series best, fallback on older | All NVIDIA GPUs |
Recommendation: Use FP8 for production deployment and efficient inference. Use FP16 only if you have 40GB+ VRAM and need maximum quality for research purposes.
License
This model is released under a custom WAN license. Please review the license terms before use:
- Commercial Use: Check official WAN license documentation
- Research Use: Generally permitted with attribution
- Redistribution: May have restrictions - consult license
- Ethical Use: Follow ethical AI guidelines and avoid generating harmful content
License Name: wan-license License Type: other (proprietary/custom)
For detailed license information, refer to the official WAN model documentation.
Citation
If you use this model in your research or projects, please cite:
@software{wan21_i2v_720p_fp8,
title={WAN 2.1 Image-to-Video 720p FP8: High-Resolution Video Generation},
author={WAN Development Team},
year={2024},
note={FP8 quantized 14B parameter image-to-video diffusion model for 720p generation},
url={https://huggingface.co/models}
}
Troubleshooting
Out of Memory Errors
# Enable all memory optimizations
pipe.enable_attention_slicing(slice_size=1)
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload() # Offload to CPU when not in use
# Reduce generation parameters
num_frames = 16 # Instead of 24
num_inference_steps = 30 # Instead of 50
Slow Generation Speed
- Ensure FP8 support: Check GPU architecture (RTX 40 series recommended)
- Update drivers: Latest NVIDIA drivers improve FP8 performance
- Install xformers:
pip install xformersfor optimized attention - Check PyTorch version: PyTorch 2.1+ required for FP8 support
Quality Issues
- Increase inference steps: 50+ steps for better quality
- Adjust guidance scale: Try 7.0-8.5 range
- Check input image quality: Higher quality inputs produce better outputs
- Verify model integrity: Ensure complete download (16GB exactly)
Related Resources
- WAN 2.1 I2V 480p FP8 - Lower resolution variant for faster generation
- WAN 2.1 I2V FP16 - Full precision models for maximum quality
- WAN 2.2 Models - Next generation with enhanced controls
- WAN Camera Control LoRAs - Additional camera movement capabilities
- Official Documentation - Complete usage guides and API reference
Model Card Contact
For questions, issues, or feedback about the WAN 2.1 I2V 720p FP8 model:
- Official Website: Check WAN model official documentation
- Community Forum: Hugging Face model discussions
- Technical Issues: Report through official channels
- Research Inquiries: Contact WAN development team
Changelog
v1.0 (Current)
- Initial release of WAN 2.1 I2V 720p FP8 model
- 14B parameter transformer architecture
- FP8 E4M3FN quantization for efficiency
- 720p resolution support
- Compatible with camera control LoRAs
- SafeTensors format for security and efficiency
Version: v1.0 Last Updated: 2024-10 Model Type: Image-to-Video Diffusion Model Resolution: 720p (1280x720) Precision: FP8 E4M3FN Size: 16 GB
Note: This is a quantized model optimized for efficient deployment on consumer GPUs. For maximum quality requirements, consider the FP16 variant. Please use responsibly and follow ethical AI guidelines.
- Downloads last month
- -