Instructions to use lightx2v/Wan2.1-Distill-Models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use lightx2v/Wan2.1-Distill-Models with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-14B,Wan-AI/Wan2.1-I2V-14B-480P,Wan-AI/Wan2.1-I2V-14B-720P", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("lightx2v/Wan2.1-Distill-Models") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Diffusion Single File
How to use lightx2v/Wan2.1-Distill-Models with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
File size: 7,536 Bytes
860741a 68ce32c 860741a 68ce32c 860741a e13d47b 68ce32c 860741a 68ce32c a36df1d 7214d23 a36df1d 7214d23 a36df1d 7214d23 68ce32c 7214d23 fbd0c0d b010798 a36df1d 7214d23 a36df1d 7214d23 a36df1d 7214d23 a36df1d 7214d23 a36df1d 7214d23 a36df1d 7214d23 a36df1d b010798 a36df1d 7214d23 a36df1d 7214d23 a36df1d 7214d23 68ce32c 7214d23 6197166 7214d23 d811ce3 7214d23 68ce32c 7214d23 6197166 7214d23 6197166 7214d23 6197166 7214d23 83cc586 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | ---
base_model:
- Wan-AI/Wan2.1-T2V-14B
- Wan-AI/Wan2.1-I2V-14B-480P
- Wan-AI/Wan2.1-I2V-14B-720P
library_name: diffusers
license: apache-2.0
pipeline_tag: text-to-video
tags:
- diffusion-single-file
- comfyui
- distillation
- lora
- video
- video generation
---
<div align="center">
# π¬ Wan2.1 Distilled Models
### β‘ High-Performance Video Generation with 4-Step Inference
*Distillation-accelerated versions of Wan2.1 based on the paper [SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation](https://huggingface.co/papers/2605.30116) - Dramatically faster while maintaining exceptional quality*

---
[](https://huggingface.co/lightx2v/Wan2.1-Distill-Models)
[](https://github.com/ModelTC/LightX2V)
[](LICENSE)
</div>
---
## π What's Special?
<table>
<tr>
<td width="50%">
### β‘ Ultra-Fast Generation
- **4-step inference** (vs traditional 50+ steps)
- Up to **2x faster** than ComfyUI
- Real-time video generation capability
</td>
<td width="50%">
### π― Flexible Options
- Multiple resolutions (480P/720P)
- Various precision formats (BF16/FP8/INT8)
- I2V and T2V support
</td>
</tr>
<tr>
<td width="50%">
### πΎ Memory Efficient
- FP8/INT8: **~50% size reduction**
- CPU offload support
- Optimized for consumer GPUs
</td>
<td width="50%">
### π§ Easy Integration
- Compatible with LightX2V framework
- ComfyUI support available
- Simple configuration files
</td>
</tr>
</table>
---
## π¦ Model Catalog
### π₯ Model Types
<table>
<tr>
<td align="center" width="50%">
#### πΌοΈ **Image-to-Video (I2V)**
Transform still images into dynamic videos
- πΊ 480P Resolution
- π¬ 720P Resolution
</td>
<td align="center" width="50%">
#### π **Text-to-Video (T2V)**
Generate videos from text descriptions
- π 14B Parameters
- π¨ High-quality synthesis
</td>
</tr>
</table>
### π― Precision Variants
| Precision | Model Identifier | Model Size | Framework | Quality vs Speed |
|:---------:|:-----------------|:----------:|:---------:|:-----------------|
| π **BF16** | `lightx2v_4step` | ~28-32 GB | LightX2V | βββββ Highest quality |
| β‘ **FP8** | `scaled_fp8_e4m3_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Excellent balance |
| π― **INT8** | `int8_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Fast & efficient |
| π· **FP8 ComfyUI** | `scaled_fp8_e4m3_lightx2v_4step_comfyui` | ~15-17 GB | ComfyUI | βββ ComfyUI ready |
### π Naming Convention
```bash
# Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors
# Examples:
wan2.1_i2v_720p_lightx2v_4step.safetensors # 720P I2V - BF16
wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # 720P I2V - FP8
wan2.1_i2v_480p_int8_lightx2v_4step.safetensors # 480P I2V - INT8
wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors # T2V - FP8 ComfyUI
```
> π‘ **Explore all models**: [Browse Full Model Collection β](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main)
## π Usage
**LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!**
### Python Sample Usage
```python
from lightx2v import LightX2VPipeline
# Initialize pipeline for Wan2.1 I2V task
pipe = LightX2VPipeline(
model_path="lightx2v/Wan2.1-Distill-Models",
model_cls="wan2.1",
task="i2v",
)
# Enable offloading to reduce VRAM usage (suitable for consumer GPUs)
pipe.enable_offload(
cpu_offload=True,
offload_granularity="block",
text_encoder_offload=True,
image_encoder_offload=False,
vae_offload=False,
)
# Create generator with 4-step distilled inference
pipe.create_generator(
attn_mode="sage_attn2",
infer_steps=4,
height=480,
width=832,
num_frames=81,
guidance_scale=5.0,
sample_shift=5.0,
)
# Generate video
pipe.generate(
seed=42,
image_path="path/to/image.jpg",
prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard.",
negative_prompt="shaking camera, low quality, static",
save_result_path="output.mp4",
)
```
#### Quick Start (CLI)
1. Download model (720P I2V FP8 example)
```bash
huggingface-cli download lightx2v/Wan2.1-Distill-Models \
--local-dir ./models/wan2.1_i2v_720p \
--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors"
```
2. Clone LightX2V repository
```bash
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
```
3. Install dependencies
```bash
pip install -r requirements.txt
```
Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker
4. Select and modify configuration file
Choose the appropriate configuration based on your GPU memory:
**For 80GB+ GPU (A100/H100)**
- I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json)
- T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json)
**For 24GB+ GPU (RTX 4090)**
- I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json)
- T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json)
5. Run inference
```bash
cd scripts
bash wan/run_wan_i2v_distill_4step_cfg.sh
```
#### Documentation
- **Quick Start Guide**: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md)
- **Complete Usage Guide**: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md)
- **Configuration Guide**: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill)
- **Quantization Usage**: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md)
- **Parameter Offload**: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md)
#### Performance Advantages
- β‘ **Fast**: Approximately **2x faster** than ComfyUI
- π― **Optimized**: Deeply optimized for distilled models
- πΎ **Memory Efficient**: Supports CPU offload and other memory optimization techniques
- π οΈ **Flexible**: Supports multiple quantization formats and configuration options
### Community
- **Issues**: https://github.com/ModelTC/LightX2V/issues
## β οΈ Important Notes
1. **Additional Components**: These models only contain DIT weights. You also need:
- T5 text encoder
- CLIP vision encoder
- VAE encoder/decoder
- Tokenizers
Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory.
If you find this project helpful, please give us a β on [GitHub](https://github.com/ModelTC/LightX2V) |