nielsr HF Staff

Add paper link, pipeline tag and sample usage

68ce32c verified 2 days ago

7.54 kB

	---
	base_model:
	- Wan-AI/Wan2.1-T2V-14B
	- Wan-AI/Wan2.1-I2V-14B-480P
	- Wan-AI/Wan2.1-I2V-14B-720P
	library_name: diffusers
	license: apache-2.0
	pipeline_tag: text-to-video
	tags:
	- diffusion-single-file
	- comfyui
	- distillation
	- lora
	- video
	- video generation
	---

	<div align="center">

	# 🎬 Wan2.1 Distilled Models

	### ⚡ High-Performance Video Generation with 4-Step Inference

	Distillation-accelerated versions of Wan2.1 based on the paper [SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation](https://huggingface.co/papers/2605.30116) - Dramatically faster while maintaining exceptional quality

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gXhUuWyuJpxOwGf5GQ49r.png)

	---

	[![🤗 HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
	[![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V)
	[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)

	</div>

	---

	## 🌟 What's Special?

	<table>
	<tr>
	<td width="50%">

	### ⚡ Ultra-Fast Generation
	- 4-step inference (vs traditional 50+ steps)
	- Up to 2x faster than ComfyUI
	- Real-time video generation capability

	</td>
	<td width="50%">

	### 🎯 Flexible Options
	- Multiple resolutions (480P/720P)
	- Various precision formats (BF16/FP8/INT8)
	- I2V and T2V support

	</td>
	</tr>
	<tr>
	<td width="50%">

	### 💾 Memory Efficient
	- FP8/INT8: ~50% size reduction
	- CPU offload support
	- Optimized for consumer GPUs

	</td>
	<td width="50%">

	### 🔧 Easy Integration
	- Compatible with LightX2V framework
	- ComfyUI support available
	- Simple configuration files

	</td>
	</tr>
	</table>

	---

	## 📦 Model Catalog

	### 🎥 Model Types

	<table>
	<tr>
	<td align="center" width="50%">

	#### 🖼️ Image-to-Video (I2V)
	Transform still images into dynamic videos
	- 📺 480P Resolution
	- 🎬 720P Resolution

	</td>
	<td align="center" width="50%">

	#### 📝 Text-to-Video (T2V)
	Generate videos from text descriptions
	- 🚀 14B Parameters
	- 🎨 High-quality synthesis

	</td>
	</tr>
	</table>

	### 🎯 Precision Variants

	\| Precision \| Model Identifier \| Model Size \| Framework \| Quality vs Speed \|
	\|:---------:\|:-----------------\|:----------:\|:---------:\|:-----------------\|
	\| 🏆 BF16 \| `lightx2v_4step` \| ~28-32 GB \| LightX2V \| ⭐⭐⭐⭐⭐ Highest quality \|
	\| ⚡ FP8 \| `scaled_fp8_e4m3_lightx2v_4step` \| ~15-17 GB \| LightX2V \| ⭐⭐⭐⭐ Excellent balance \|
	\| 🎯 INT8 \| `int8_lightx2v_4step` \| ~15-17 GB \| LightX2V \| ⭐⭐⭐⭐ Fast & efficient \|
	\| 🔷 FP8 ComfyUI \| `scaled_fp8_e4m3_lightx2v_4step_comfyui` \| ~15-17 GB \| ComfyUI \| ⭐⭐⭐ ComfyUI ready \|

	### 📝 Naming Convention

	```bash
	# Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors

	# Examples:
	wan2.1_i2v_720p_lightx2v_4step.safetensors # 720P I2V - BF16
	wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # 720P I2V - FP8
	wan2.1_i2v_480p_int8_lightx2v_4step.safetensors # 480P I2V - INT8
	wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors # T2V - FP8 ComfyUI
	```

	> 💡 Explore all models: [Browse Full Model Collection →](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main)

	## 🚀 Usage

	LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!

	### Python Sample Usage

	```python
	from lightx2v import LightX2VPipeline

	# Initialize pipeline for Wan2.1 I2V task
	pipe = LightX2VPipeline(
	model_path="lightx2v/Wan2.1-Distill-Models",
	model_cls="wan2.1",
	task="i2v",
	)

	# Enable offloading to reduce VRAM usage (suitable for consumer GPUs)
	pipe.enable_offload(
	cpu_offload=True,
	offload_granularity="block",
	text_encoder_offload=True,
	image_encoder_offload=False,
	vae_offload=False,
	)

	# Create generator with 4-step distilled inference
	pipe.create_generator(
	attn_mode="sage_attn2",
	infer_steps=4,
	height=480,
	width=832,
	num_frames=81,
	guidance_scale=5.0,
	sample_shift=5.0,
	)

	# Generate video
	pipe.generate(
	seed=42,
	image_path="path/to/image.jpg",
	prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard.",
	negative_prompt="shaking camera, low quality, static",
	save_result_path="output.mp4",
	)
	```

	#### Quick Start (CLI)

	1. Download model (720P I2V FP8 example)
	```bash
	huggingface-cli download lightx2v/Wan2.1-Distill-Models \
	--local-dir ./models/wan2.1_i2v_720p \
	--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
	```

	2. Clone LightX2V repository

	```bash
	git clone https://github.com/ModelTC/LightX2V.git
	cd LightX2V
	```

	3. Install dependencies

	```bash
	pip install -r requirements.txt
	```
	Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker

	4. Select and modify configuration file

	Choose the appropriate configuration based on your GPU memory:

	For 80GB+ GPU (A100/H100)
	- I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json)
	- T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json)

	For 24GB+ GPU (RTX 4090)
	- I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json)
	- T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json)


	5. Run inference
	```bash
	cd scripts
	bash wan/run_wan_i2v_distill_4step_cfg.sh
	```

	#### Documentation
	- Quick Start Guide: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md)
	- Complete Usage Guide: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md)
	- Configuration Guide: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill)
	- Quantization Usage: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md)
	- Parameter Offload: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md)


	#### Performance Advantages

	- ⚡ Fast: Approximately 2x faster than ComfyUI
	- 🎯 Optimized: Deeply optimized for distilled models
	- 💾 Memory Efficient: Supports CPU offload and other memory optimization techniques
	- 🛠️ Flexible: Supports multiple quantization formats and configuration options


	### Community
	- Issues: https://github.com/ModelTC/LightX2V/issues

	## ⚠️ Important Notes

	1. Additional Components: These models only contain DIT weights. You also need:
	- T5 text encoder
	- CLIP vision encoder
	- VAE encoder/decoder
	- Tokenizers

	Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory.

	If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)