WAN 2.1 I2V 720p FP8 - High-Resolution Image-to-Video Model

This repository contains the WAN (Wan An) 2.1 Image-to-Video 720p model in FP8 E4M3FN precision for high-resolution video generation from static images. The FP8 quantization provides approximately 50% memory savings compared to FP16 while maintaining high-quality video output.

Model Description

WAN 2.1 I2V 720p FP8 is a 14-billion parameter transformer-based image-to-video model optimized for generating 720p resolution videos from input images. The FP8 E4M3FN quantization format reduces model size and VRAM requirements while preserving generation quality, making it suitable for deployment on consumer GPUs with 24GB+ VRAM.

Key Capabilities:

Generate 720p resolution videos from static images
Support for camera control LoRAs (rotation, arc shots, drone perspectives)
FP8 quantization for efficient inference (~40% VRAM savings vs FP16)
Compatible with diffusers library and standard image-to-video workflows

Repository Contents

Total Repository Size: ~16 GB

Model Files

diffusion_models/wan/
└── wan21-i2v-720p-14b-fp8-e4m3fn.safetensors    16 GB

Diffusion Model:

File: diffusion_models/wan/wan21-i2v-720p-14b-fp8-e4m3fn.safetensors
Size: 16 GB
Precision: FP8 E4M3FN (8-bit floating point)
Resolution: 720p video generation
Parameters: 14 billion
Architecture: Transformer-based image-to-video diffusion model
Format: SafeTensors (secure, efficient)

Hardware Requirements

VRAM: 24GB+ recommended for 720p generation
- Minimum: 20GB with optimizations (gradient checkpointing, attention slicing)
- Recommended: RTX 4090 (24GB), RTX A5000 (24GB), or higher
Disk Space: 16 GB for model file
System RAM: 32GB+ recommended for optimal performance
GPU: NVIDIA GPU with FP8 tensor support preferred
- Best performance: Ada Lovelace (RTX 40 series) or Hopper architecture
- Compatible: Ampere (RTX 30 series) with automatic FP16 fallback
Operating System: Windows 10/11, Linux (Ubuntu 20.04+)

FP8 Performance Benefits

VRAM Usage: ~40% reduction compared to FP16 variant
Inference Speed: Up to 1.5-2x faster on FP8-capable GPUs (RTX 40 series)
Model Size: 50% smaller than FP16 (16GB vs 32GB)
Quality: >95% generation quality preservation vs FP16

Usage

Basic Image-to-Video Generation

from diffusers import DiffusionPipeline
import torch
from PIL import Image

# Load the WAN 2.1 I2V 720p FP8 model
pipe = DiffusionPipeline.from_single_file(
    "E:/huggingface/wan21-fp8-720p/diffusion_models/wan/wan21-i2v-720p-14b-fp8-e4m3fn.safetensors",
    torch_dtype=torch.float8_e4m3fn,  # FP8 precision
    use_safetensors=True
)

pipe.to("cuda")

# Load input image
input_image = Image.open("path/to/your/image.jpg")

# Generate 720p video from image
video_frames = pipe(
    image=input_image,
    prompt="smooth camera movement, cinematic lighting",
    num_frames=24,
    num_inference_steps=50,
    guidance_scale=7.5,
    height=720,
    width=1280
).frames[0]

# Save video
from diffusers.utils import export_to_video
export_to_video(video_frames, "output_720p.mp4", fps=8)

Memory-Optimized Generation

# Enable memory optimizations for 20GB VRAM GPUs
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

# Optional: Enable gradient checkpointing (slower but uses less memory)
# pipe.unet.enable_gradient_checkpointing()

# Generate with reduced memory footprint
video_frames = pipe(
    image=input_image,
    prompt="your prompt here",
    num_frames=16,  # Reduce frames for lower memory usage
    num_inference_steps=30,  # Reduce steps for faster generation
    guidance_scale=7.5
).frames[0]

Integration with Camera Control LoRAs

This model is compatible with WAN 2.1 camera control LoRAs (sold/distributed separately):

# Load camera control LoRA (example - requires separate LoRA file)
pipe.load_lora_weights(
    "path/to/wan21-camera-rotation-rank16-v1.safetensors"
)

# Generate with camera control
video_frames = pipe(
    image=input_image,
    prompt="rotating camera around the subject, 720p quality",
    num_frames=24,
    num_inference_steps=50,
    guidance_scale=7.5
).frames[0]

Compatible LoRAs:

wan21-camera-rotation-rank16-v1.safetensors - Orbital camera movements
wan21-camera-arcshot-rank16-v1.safetensors - Curved dolly movements
wan21-camera-drone-rank16-v1.safetensors - Aerial perspectives

Model Specifications

Specification	Value
Model Name	WAN 2.1 I2V 720p FP8
Parameters	14 billion
Architecture	Transformer-based I2V diffusion model
Precision	FP8 E4M3FN (8-bit floating point)
Resolution	720p (1280x720)
Format	SafeTensors
Task	Image-to-Video Generation
Library	diffusers
Quantization	Post-training quantization (PTQ)
Quality Loss	<5% compared to FP16

Performance Tips

GPU Selection: Best performance on RTX 4090, RTX 4080, or newer GPUs with native FP8 support
Memory Optimization: Enable attention slicing and VAE slicing for 20-22GB VRAM GPUs
Batch Size: Generate single videos sequentially to avoid VRAM exhaustion
Frame Count: Start with 16-24 frames, increase if VRAM permits
Inference Steps: 30-50 steps provide good quality; higher steps improve quality marginally
Guidance Scale: 7.0-8.5 works well; adjust based on prompt strength needed
Mixed Precision: Model automatically falls back to FP16 on non-FP8 GPUs (VRAM usage increases)
Resolution: This is the 720p variant - use 480p model for lower VRAM requirements

Installation

Prerequisites

# Python 3.8 or higher
python --version

# CUDA 11.8 or higher (for NVIDIA GPUs)
nvcc --version

Install Dependencies

# Install PyTorch with CUDA support (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install diffusers and dependencies
pip install diffusers transformers accelerate safetensors

# Optional: Install xformers for memory-efficient attention
pip install xformers

Verify Installation

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU: {torch.cuda.get_device_name(0)}")

Comparison with Other Variants

WAN 2.1 I2V 480p FP8 vs 720p FP8 (This Model)

Feature	480p FP8	720p FP8 (This Model)
Resolution	854x480	1280x720
Model Size	16 GB	16 GB
VRAM Requirement	18GB+	24GB+
Quality	Good	High
Speed	Faster	Moderate
Use Case	Fast iteration, previews	Final output, production

FP8 vs FP16 (720p Models)

Feature	FP8 (This Model)	FP16
Model Size	16 GB	32 GB
VRAM Usage	~18-22 GB	~30-38 GB
Inference Speed	Faster (1.5-2x on RTX 40 series)	Baseline
Quality	>95% of FP16	100% (reference)
GPU Compatibility	RTX 40 series best, fallback on older	All NVIDIA GPUs

Recommendation: Use FP8 for production deployment and efficient inference. Use FP16 only if you have 40GB+ VRAM and need maximum quality for research purposes.

License

This model is released under a custom WAN license. Please review the license terms before use:

Commercial Use: Check official WAN license documentation
Research Use: Generally permitted with attribution
Redistribution: May have restrictions - consult license
Ethical Use: Follow ethical AI guidelines and avoid generating harmful content

License Name: wan-license License Type: other (proprietary/custom)

For detailed license information, refer to the official WAN model documentation.

Citation

If you use this model in your research or projects, please cite:

@software{wan21_i2v_720p_fp8,
  title={WAN 2.1 Image-to-Video 720p FP8: High-Resolution Video Generation},
  author={WAN Development Team},
  year={2024},
  note={FP8 quantized 14B parameter image-to-video diffusion model for 720p generation},
  url={https://huggingface.co/models}
}

Troubleshooting

Out of Memory Errors

# Enable all memory optimizations
pipe.enable_attention_slicing(slice_size=1)
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()  # Offload to CPU when not in use

# Reduce generation parameters
num_frames = 16  # Instead of 24
num_inference_steps = 30  # Instead of 50

Slow Generation Speed

Ensure FP8 support: Check GPU architecture (RTX 40 series recommended)
Update drivers: Latest NVIDIA drivers improve FP8 performance
Install xformers: pip install xformers for optimized attention
Check PyTorch version: PyTorch 2.1+ required for FP8 support

Quality Issues

Increase inference steps: 50+ steps for better quality
Adjust guidance scale: Try 7.0-8.5 range
Check input image quality: Higher quality inputs produce better outputs
Verify model integrity: Ensure complete download (16GB exactly)

Related Resources

WAN 2.1 I2V 480p FP8 - Lower resolution variant for faster generation
WAN 2.1 I2V FP16 - Full precision models for maximum quality
WAN 2.2 Models - Next generation with enhanced controls
WAN Camera Control LoRAs - Additional camera movement capabilities
Official Documentation - Complete usage guides and API reference

Model Card Contact

For questions, issues, or feedback about the WAN 2.1 I2V 720p FP8 model:

Official Website: Check WAN model official documentation
Community Forum: Hugging Face model discussions
Technical Issues: Report through official channels
Research Inquiries: Contact WAN development team

Changelog

v1.0 (Current)

Initial release of WAN 2.1 I2V 720p FP8 model
14B parameter transformer architecture
FP8 E4M3FN quantization for efficiency
720p resolution support
Compatible with camera control LoRAs
SafeTensors format for security and efficiency

Version: v1.0 Last Updated: 2024-10 Model Type: Image-to-Video Diffusion Model Resolution: 720p (1280x720) Precision: FP8 E4M3FN Size: 16 GB

Note: This is a quantized model optimized for efficient deployment on consumer GPUs. For maximum quality requirements, consider the FP16 variant. Please use responsibly and follow ethical AI guidelines.

Downloads last month: -

Collection including wangkanai/wan21-fp8-720p

wan-2.1

Collection

WAN 2.1 Video models • 11 items • Updated Mar 2 • 2