File size: 7,536 Bytes

---
base_model:
- Wan-AI/Wan2.1-T2V-14B
- Wan-AI/Wan2.1-I2V-14B-480P
- Wan-AI/Wan2.1-I2V-14B-720P
library_name: diffusers
license: apache-2.0
pipeline_tag: text-to-video
tags:
- diffusion-single-file
- comfyui
- distillation
- lora
- video
- video generation
---

<div align="center">

# 🎬 Wan2.1 Distilled Models

### ⚡ High-Performance Video Generation with 4-Step Inference

*Distillation-accelerated versions of Wan2.1 based on the paper [SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation](https://huggingface.co/papers/2605.30116) - Dramatically faster while maintaining exceptional quality*

![image/png](https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gXhUuWyuJpxOwGf5GQ49r.png)

---

[![🤗 HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
[![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V)
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)

</div>

---

## 🌟 What's Special?

<table>
<tr>
<td width="50%">

### ⚡ Ultra-Fast Generation
- **4-step inference** (vs traditional 50+ steps)
- Up to **2x faster** than ComfyUI
- Real-time video generation capability

</td>
<td width="50%">

### 🎯 Flexible Options
- Multiple resolutions (480P/720P)
- Various precision formats (BF16/FP8/INT8)
- I2V and T2V support

</td>
</tr>
<tr>
<td width="50%">

### 💾 Memory Efficient
- FP8/INT8: **~50% size reduction**
- CPU offload support
- Optimized for consumer GPUs

</td>
<td width="50%">

### 🔧 Easy Integration
- Compatible with LightX2V framework
- ComfyUI support available
- Simple configuration files

</td>
</tr>
</table>

---

## 📦 Model Catalog

### 🎥 Model Types

<table>
<tr>
<td align="center" width="50%">

#### 🖼️ **Image-to-Video (I2V)**
Transform still images into dynamic videos
- 📺 480P Resolution
- 🎬 720P Resolution

</td>
<td align="center" width="50%">

#### 📝 **Text-to-Video (T2V)**
Generate videos from text descriptions
- 🚀 14B Parameters
- 🎨 High-quality synthesis

</td>
</tr>
</table>

### 🎯 Precision Variants

| Precision | Model Identifier | Model Size | Framework | Quality vs Speed |
|:---------:|:-----------------|:----------:|:---------:|:-----------------|
| 🏆 **BF16** | `lightx2v_4step` | ~28-32 GB | LightX2V | ⭐⭐⭐⭐⭐ Highest quality |
| ⚡ **FP8** | `scaled_fp8_e4m3_lightx2v_4step` | ~15-17 GB | LightX2V | ⭐⭐⭐⭐ Excellent balance |
| 🎯 **INT8** | `int8_lightx2v_4step` | ~15-17 GB | LightX2V | ⭐⭐⭐⭐ Fast & efficient |
| 🔷 **FP8 ComfyUI** | `scaled_fp8_e4m3_lightx2v_4step_comfyui` | ~15-17 GB | ComfyUI | ⭐⭐⭐ ComfyUI ready |

### 📝 Naming Convention

```bash
# Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors

# Examples:
wan2.1_i2v_720p_lightx2v_4step.safetensors                          # 720P I2V - BF16
wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors         # 720P I2V - FP8
wan2.1_i2v_480p_int8_lightx2v_4step.safetensors                    # 480P I2V - INT8
wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors  # T2V - FP8 ComfyUI
```

> 💡 **Explore all models**: [Browse Full Model Collection →](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main)

## 🚀 Usage

**LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!**

### Python Sample Usage

```python
from lightx2v import LightX2VPipeline

# Initialize pipeline for Wan2.1 I2V task
pipe = LightX2VPipeline(
    model_path="lightx2v/Wan2.1-Distill-Models",
    model_cls="wan2.1",
    task="i2v",
)

# Enable offloading to reduce VRAM usage (suitable for consumer GPUs)
pipe.enable_offload(
    cpu_offload=True,
    offload_granularity="block",
    text_encoder_offload=True,
    image_encoder_offload=False,
    vae_offload=False,
)

# Create generator with 4-step distilled inference
pipe.create_generator(
    attn_mode="sage_attn2",
    infer_steps=4,
    height=480,
    width=832,
    num_frames=81,
    guidance_scale=5.0,
    sample_shift=5.0,
)

# Generate video
pipe.generate(
    seed=42,
    image_path="path/to/image.jpg",
    prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard.",
    negative_prompt="shaking camera, low quality, static",
    save_result_path="output.mp4",
)
```

#### Quick Start (CLI)

1. Download model (720P I2V FP8 example)
```bash
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
    --local-dir ./models/wan2.1_i2v_720p \
    --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
```

2. Clone LightX2V repository

```bash
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
```

3. Install dependencies

```bash
pip install -r requirements.txt
```
Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker

4. Select and modify configuration file

Choose the appropriate configuration based on your GPU memory:

**For 80GB+ GPU (A100/H100)**
- I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json)
- T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json)

**For 24GB+ GPU (RTX 4090)**
- I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json)
- T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json)


5. Run inference
```bash
cd scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
```

#### Documentation
- **Quick Start Guide**: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md)
- **Complete Usage Guide**: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md)
- **Configuration Guide**: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill)
- **Quantization Usage**: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md)
- **Parameter Offload**: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md)


#### Performance Advantages

- ⚡ **Fast**: Approximately **2x faster** than ComfyUI
- 🎯 **Optimized**: Deeply optimized for distilled models
- 💾 **Memory Efficient**: Supports CPU offload and other memory optimization techniques
- 🛠️ **Flexible**: Supports multiple quantization formats and configuration options


### Community
- **Issues**: https://github.com/ModelTC/LightX2V/issues

## ⚠️ Important Notes

1. **Additional Components**: These models only contain DIT weights. You also need:
   - T5 text encoder
   - CLIP vision encoder
   - VAE encoder/decoder
   - Tokenizers

   Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory.

If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)