Upload folder using huggingface_hub

2212798 verified 6 months ago

12.5 kB

	---
	license: other
	license_name: wan-license
	library_name: diffusers
	pipeline_tag: image-to-video
	tags:
	- wan
	- image-to-video
	- video-generation
	- wan21
	- fp16
	- 480p
	- diffusion
	- 14b-parameters
	---

	<!-- README Version: v1.2 -->

	# WAN 2.1 FP16 480p - Image-to-Video Diffusion Model

	High-fidelity 480p image-to-video generation model in full FP16 precision (14 billion parameters). Part of the WAN (Wan An) 2.1 model family for transforming static images into dynamic videos.

	## Model Description

	WAN 2.1 I2V 480p is a 14-billion parameter transformer-based diffusion model that generates videos from static images. This FP16 variant provides maximum numerical precision and generation quality for research and high-quality video synthesis applications. The 480p resolution offers a balanced approach between quality and computational requirements.

	Key Capabilities:
	- Image-to-video generation with temporal coherence
	- 480p resolution output (balanced quality/performance)
	- Full FP16 precision (16-bit floating point)
	- Compatible with camera control LoRAs for cinematic effects
	- Optimized for research and professional production workflows

	## Repository Contents

	```
	wan21-fp16-480p/
	└── diffusion_models/
	└── wan/
	└── wan21-i2v-480p-14b-fp16.safetensors (31.0 GB)
	```

	Total Repository Size: 31.0 GB

	### Model Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `wan21-i2v-480p-14b-fp16.safetensors` \| 31.0 GB \| WAN 2.1 I2V 480p diffusion model (14B parameters, FP16 precision) \|

	## Hardware Requirements

	### Minimum Requirements
	- VRAM: 32 GB (for basic inference)
	- System RAM: 32 GB
	- Disk Space: 31 GB for model file
	- GPU: NVIDIA GPU with FP16 support (RTX 3090, A6000, or better)

	### Recommended Requirements
	- VRAM: 40 GB+ (for optimal performance and batch processing)
	- System RAM: 64 GB
	- GPU: High-end NVIDIA GPU (RTX 4090, A6000, A100)
	- Storage: SSD for faster model loading

	### Performance Notes
	- FP16 precision requires more VRAM than quantized variants (FP8)
	- Enable memory optimization techniques for 24GB GPUs (gradient checkpointing, attention slicing)
	- For production deployment with lower VRAM, consider FP8 quantized variants

	## Usage Examples

	### Basic Image-to-Video Generation

	```python
	from diffusers import DiffusionPipeline
	from PIL import Image
	import torch

	# Load the WAN 2.1 I2V 480p FP16 model
	pipe = DiffusionPipeline.from_single_file(
	"E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors",
	torch_dtype=torch.float16,
	use_safetensors=True
	)

	pipe.to("cuda")

	# Load input image
	input_image = Image.open("path/to/your/image.jpg")

	# Generate video from image
	video = pipe(
	image=input_image,
	prompt="smooth camera movement, cinematic lighting",
	num_frames=24,
	num_inference_steps=50,
	guidance_scale=7.5
	).frames[0]

	# Export video
	from diffusers.utils import export_to_video
	export_to_video(video, "output_video.mp4", fps=8)
	```

	### With Memory Optimization (for lower VRAM)

	```python
	from diffusers import DiffusionPipeline
	import torch

	# Load model with memory optimizations
	pipe = DiffusionPipeline.from_single_file(
	"E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors",
	torch_dtype=torch.float16,
	use_safetensors=True
	)

	# Enable memory-efficient attention
	pipe.enable_attention_slicing()
	pipe.enable_vae_slicing()

	# For even lower VRAM usage
	pipe.enable_model_cpu_offload()

	pipe.to("cuda")

	# Generate video with optimizations
	video = pipe(
	image=input_image,
	prompt="your prompt here",
	num_frames=16, # Reduce frames for lower memory
	num_inference_steps=30, # Fewer steps for faster generation
	guidance_scale=7.5
	).frames[0]
	```

	### With Camera Control LoRAs

	```python
	from diffusers import DiffusionPipeline
	from PIL import Image
	import torch

	# Load base model
	pipe = DiffusionPipeline.from_single_file(
	"E:/huggingface/wan21-fp16-480p/diffusion_models/wan/wan21-i2v-480p-14b-fp16.safetensors",
	torch_dtype=torch.float16,
	use_safetensors=True
	)

	pipe.to("cuda")

	# Load camera control LoRA (requires separate download)
	# Example: rotation, arc shot, or drone camera movements
	pipe.load_lora_weights(
	"path/to/wan21-camera-rotation-rank16-v1.safetensors"
	)

	# Generate with camera control
	video = pipe(
	image=input_image,
	prompt="rotating camera around the subject, cinematic",
	num_frames=24,
	num_inference_steps=50,
	guidance_scale=7.5
	).frames[0]

	export_to_video(video, "output_rotating.mp4", fps=8)
	```

	## Model Specifications

	\| Specification \| Value \|
	\|--------------\|-------\|
	\| Architecture \| Transformer-based image-to-video diffusion model \|
	\| Parameters \| 14 billion \|
	\| Precision \| FP16 (16-bit floating point) \|
	\| Resolution \| 480p (video output) \|
	\| Format \| SafeTensors \|
	\| Model Size \| 31.0 GB \|
	\| Task \| Image-to-video generation \|
	\| Library \| diffusers \|
	\| Compatible LoRAs \| WAN 2.1 camera control LoRAs (rotation, arc shot, drone) \|

	### Technical Details
	- FP16 Format: 1 sign bit, 5-bit exponent, 10-bit mantissa
	- Numerical Range: ±65,504 (max value)
	- Precision: ~3-4 decimal digits
	- Quality: Full precision without quantization artifacts
	- Compatibility: All modern PyTorch versions with CUDA support

	## Installation

	```bash
	# Install required dependencies
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
	pip install diffusers transformers accelerate safetensors pillow

	# For video export
	pip install opencv-python imageio imageio-ffmpeg
	```

	### Requirements
	- Python 3.8+
	- PyTorch 2.0+
	- diffusers >= 0.21.0
	- transformers
	- accelerate
	- safetensors
	- PIL/Pillow
	- CUDA 11.8+ (or compatible version)

	## Performance Tips

	1. Memory Optimization
	- Enable `attention_slicing()` and `vae_slicing()` for lower VRAM usage
	- Use `enable_model_cpu_offload()` for 24GB GPUs
	- Reduce `num_frames` and `num_inference_steps` for faster generation

	2. Quality Optimization
	- Use `guidance_scale` between 7.0-9.0 for best results
	- Higher `num_inference_steps` (50-75) improves quality but increases time
	- Experiment with different sampling schedulers (DDIM, DPM++, Euler)

	3. Speed Optimization
	- Use fewer inference steps (25-30) for faster generation
	- Reduce frame count for shorter videos
	- Consider FP8 quantized variants for production deployment

	4. Prompt Engineering
	- Include motion descriptions: "smooth movement", "slow pan", "camera tracking"
	- Specify lighting: "cinematic lighting", "natural light", "dramatic shadows"
	- Add quality tokens: "high quality", "detailed", "professional"

	## Version Comparison

	### WAN 2.1 Variants

	\| Variant \| Precision \| Size \| VRAM \| Use Case \|
	\|---------\|-----------\|------\|------\|----------\|
	\| FP16 480p (this) \| FP16 \| 31 GB \| 32 GB+ \| Research, archival quality \|
	\| FP16 720p \| FP16 \| 31 GB \| 40 GB+ \| Maximum quality output \|
	\| FP8 480p \| FP8 \| ~16 GB \| 18 GB+ \| Production, deployment \|
	\| FP8 720p \| FP8 \| ~16 GB \| 24 GB+ \| Production, high quality \|

	### Precision Trade-offs

	FP16 Advantages:
	- Maximum generation quality
	- Full numerical precision
	- No quantization artifacts
	- Research standard

	FP16 Disadvantages:
	- Higher VRAM requirements (2x vs FP8)
	- Larger file size (2x vs FP8)
	- Slower inference on tensor core GPUs
	- Higher deployment costs

	### When to Use FP16 480p
	- Research and development
	- Quality benchmarking
	- Archival/professional production
	- GPU with 32GB+ VRAM available
	- Maximum quality requirements

	### When to Consider Alternatives
	- FP8 variants: Production deployment, VRAM constraints, batch processing
	- 720p variants: Higher resolution requirements
	- WAN 2.2: Enhanced camera controls, quality improvements

	## Compatibility

	### Compatible Components
	- VAE: WAN 2.1 VAE (separate download required)
	- LoRAs: WAN 2.1 camera control LoRAs
	- Camera rotation (rank-16)
	- Arc shot (rank-16)
	- Drone shot (rank-16)
	- Frameworks: diffusers, ComfyUI (with appropriate nodes)

	### Camera Control LoRAs
	This model is compatible with WAN 2.1 camera control LoRAs for cinematic effects:
	- Rotation: Orbital camera movements around subjects
	- Arc Shot: Smooth curved dolly movements
	- Drone: Aerial and elevated perspectives

	Note: LoRAs are not included and must be downloaded separately.

	## License

	This model uses a custom WAN license (`wan-license`). Please review the official WAN license terms before use. This may differ from standard open-source licenses and may include restrictions on commercial use, redistribution, or specific applications.

	## Citation

	If you use this model in your research or projects, please cite:

	```bibtex
	@software{wan21_i2v_480p_fp16,
	title={WAN 2.1 Image-to-Video 480p FP16},
	year={2024},
	note={14B parameter image-to-video diffusion model in full FP16 precision},
	url={https://huggingface.co/wan21-fp16-480p}
	}
	```

	## Related Resources

	### WAN Model Family
	- WAN 2.1 FP16 720p - Higher resolution variant (31 GB, 40 GB+ VRAM)
	- WAN 2.1 FP8 - Quantized variants for efficient deployment (~50% smaller)
	- WAN 2.2 - Enhanced camera controls and quality improvements
	- WAN LightX2V - CFG step distillation adapters for faster generation

	### Additional Components
	- WAN 2.1 VAE - Video variational autoencoder (243 MB, separate download)
	- Camera Control LoRAs - Cinematic camera movement adapters (343 MB each)
	- Enhancement LoRAs - Lighting, face quality, action improvements (WAN 2.2)

	### Documentation
	- [WAN Official Documentation](https://huggingface.co/docs/diffusers/api/pipelines/wan)
	- [diffusers Library Documentation](https://huggingface.co/docs/diffusers)
	- [Camera Control LoRA Guide](https://huggingface.co/wan-models)

	## Troubleshooting

	### Common Issues

	Out of Memory Errors:
	```python
	# Enable all memory optimizations
	pipe.enable_attention_slicing()
	pipe.enable_vae_slicing()
	pipe.enable_model_cpu_offload()

	# Reduce generation parameters
	num_frames=16 # Instead of 24
	num_inference_steps=30 # Instead of 50
	```

	Slow Generation:
	- Reduce `num_inference_steps`
	- Use fewer frames
	- Disable CPU offload if you have sufficient VRAM
	- Consider FP8 variants for faster inference

	Quality Issues:
	- Increase `num_inference_steps` (50-75)
	- Adjust `guidance_scale` (try 7.0-9.0)
	- Improve prompt quality and specificity
	- Ensure input image is high quality

	## Best Practices

	1. Image Input: Use high-quality input images (1024x1024 or higher)
	2. Prompts: Be specific about motion, lighting, and camera movement
	3. Memory Management: Monitor VRAM usage and enable optimizations as needed
	4. Experimentation: Test different schedulers and parameters for your use case
	5. Responsible Use: Follow ethical AI guidelines and license terms

	## Technical Notes

	### FP16 Precision Benefits
	- Numerical Accuracy: Full 16-bit floating point precision
	- Quality: No quantization artifacts or edge cases
	- Compatibility: Broad GPU and software ecosystem support
	- Research Standard: Industry standard for development and benchmarking

	### VRAM Optimization Techniques
	```python
	# Technique 1: Attention slicing (5-10% VRAM reduction)
	pipe.enable_attention_slicing()

	# Technique 2: VAE slicing (additional 5-10% VRAM reduction)
	pipe.enable_vae_slicing()

	# Technique 3: Model CPU offload (significant VRAM reduction, slower)
	pipe.enable_model_cpu_offload()

	# Technique 4: Sequential CPU offload (maximum VRAM reduction, slowest)
	pipe.enable_sequential_cpu_offload()
	```

	## Changelog

	### v1.0 (Current)
	- Initial release of WAN 2.1 I2V 480p FP16 model
	- 14 billion parameters
	- Full FP16 precision
	- 480p resolution output
	- Compatible with WAN 2.1 camera control LoRAs

	---

	Model Version: v1.0
	Last Updated: 2024-08-12
	Maintained By: WAN Model Team

	For questions, issues, or contributions, please refer to the official WAN model repositories and community forums.

	---

	⚠️ Important: This is a high-precision model requiring significant computational resources. Ensure your hardware meets the minimum requirements before attempting to load and run this model. For production deployment or resource-constrained environments, consider the FP8 quantized variants.