|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- diffusion-single-file |
|
|
- comfyui |
|
|
- distillation |
|
|
- lora |
|
|
- video |
|
|
- video genration |
|
|
base_model: |
|
|
- Wan-AI/Wan2.1-T2V-14B |
|
|
- Wan-AI/Wan2.1-I2V-14B-480P |
|
|
- Wan-AI/Wan2.1-I2V-14B-720P |
|
|
library_name: diffusers |
|
|
--- |
|
|
<div align="center"> |
|
|
|
|
|
# π¬ Wan2.1 Distilled Models |
|
|
|
|
|
### β‘ High-Performance Video Generation with 4-Step Inference |
|
|
|
|
|
*Distillation-accelerated versions of Wan2.1 - Dramatically faster while maintaining exceptional quality* |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
[](https://huggingface.co/lightx2v/Wan2.1-Distill-Models) |
|
|
[](https://github.com/ModelTC/LightX2V) |
|
|
[](LICENSE) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π What's Special? |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td width="50%"> |
|
|
|
|
|
### β‘ Ultra-Fast Generation |
|
|
- **4-step inference** (vs traditional 50+ steps) |
|
|
- Up to **2x faster** than ComfyUI |
|
|
- Real-time video generation capability |
|
|
|
|
|
</td> |
|
|
<td width="50%"> |
|
|
|
|
|
### π― Flexible Options |
|
|
- Multiple resolutions (480P/720P) |
|
|
- Various precision formats (BF16/FP8/INT8) |
|
|
- I2V and T2V support |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td width="50%"> |
|
|
|
|
|
### πΎ Memory Efficient |
|
|
- FP8/INT8: **~50% size reduction** |
|
|
- CPU offload support |
|
|
- Optimized for consumer GPUs |
|
|
|
|
|
</td> |
|
|
<td width="50%"> |
|
|
|
|
|
### π§ Easy Integration |
|
|
- Compatible with LightX2V framework |
|
|
- ComfyUI support available |
|
|
- Simple configuration files |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Model Catalog |
|
|
|
|
|
### π₯ Model Types |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td align="center" width="50%"> |
|
|
|
|
|
#### πΌοΈ **Image-to-Video (I2V)** |
|
|
Transform still images into dynamic videos |
|
|
- πΊ 480P Resolution |
|
|
- π¬ 720P Resolution |
|
|
|
|
|
</td> |
|
|
<td align="center" width="50%"> |
|
|
|
|
|
#### π **Text-to-Video (T2V)** |
|
|
Generate videos from text descriptions |
|
|
- π 14B Parameters |
|
|
- π¨ High-quality synthesis |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
### π― Precision Variants |
|
|
|
|
|
| Precision | Model Identifier | Model Size | Framework | Quality vs Speed | |
|
|
|:---------:|:-----------------|:----------:|:---------:|:-----------------| |
|
|
| π **BF16** | `lightx2v_4step` | ~28-32 GB | LightX2V | βββββ Highest quality | |
|
|
| β‘ **FP8** | `scaled_fp8_e4m3_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Excellent balance | |
|
|
| π― **INT8** | `int8_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Fast & efficient | |
|
|
| π· **FP8 ComfyUI** | `scaled_fp8_e4m3_lightx2v_4step_comfyui` | ~15-17 GB | ComfyUI | βββ ComfyUI ready | |
|
|
|
|
|
### π Naming Convention |
|
|
|
|
|
```bash |
|
|
# Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors |
|
|
|
|
|
# Examples: |
|
|
wan2.1_i2v_720p_lightx2v_4step.safetensors # 720P I2V - BF16 |
|
|
wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # 720P I2V - FP8 |
|
|
wan2.1_i2v_480p_int8_lightx2v_4step.safetensors # 480P I2V - INT8 |
|
|
wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors # T2V - FP8 ComfyUI |
|
|
``` |
|
|
|
|
|
> π‘ **Explore all models**: [Browse Full Model Collection β](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main) |
|
|
|
|
|
## π Usage |
|
|
|
|
|
**LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!** |
|
|
|
|
|
#### Quick Start |
|
|
|
|
|
1. Download model (720P I2V FP8 example) |
|
|
```bash |
|
|
huggingface-cli download lightx2v/Wan2.1-Distill-Models \ |
|
|
--local-dir ./models/wan2.1_i2v_720p \ |
|
|
--include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors" |
|
|
``` |
|
|
|
|
|
2. Clone LightX2V repository |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/ModelTC/LightX2V.git |
|
|
cd LightX2V |
|
|
``` |
|
|
|
|
|
3. Install dependencies |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker |
|
|
|
|
|
4. Select and modify configuration file |
|
|
|
|
|
Choose the appropriate configuration based on your GPU memory: |
|
|
|
|
|
**For 80GB+ GPU (A100/H100)** |
|
|
- I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json) |
|
|
- T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json) |
|
|
|
|
|
**For 24GB+ GPU (RTX 4090)** |
|
|
- I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json) |
|
|
- T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json) |
|
|
|
|
|
|
|
|
5. Run inference |
|
|
``` |
|
|
cd scripts |
|
|
bash wan/run_wan_i2v_distill_4step_cfg.sh |
|
|
``` |
|
|
|
|
|
#### Documentation |
|
|
- **Quick Start Guide**: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) |
|
|
- **Complete Usage Guide**: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) |
|
|
- **Configuration Guide**: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill) |
|
|
- **Quantization Usage**: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md) |
|
|
- **Parameter Offload**: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md) |
|
|
|
|
|
|
|
|
#### Performance Advantages |
|
|
|
|
|
- β‘ **Fast**: Approximately **2x faster** than ComfyUI |
|
|
- π― **Optimized**: Deeply optimized for distilled models |
|
|
- πΎ **Memory Efficient**: Supports CPU offload and other memory optimization techniques |
|
|
- π οΈ **Flexible**: Supports multiple quantization formats and configuration options |
|
|
|
|
|
|
|
|
### Community |
|
|
- **Issues**: https://github.com/ModelTC/LightX2V/issues |
|
|
|
|
|
## β οΈ Important Notes |
|
|
|
|
|
1. **Additional Components**: These models only contain DIT weights. You also need: |
|
|
- T5 text encoder |
|
|
- CLIP vision encoder |
|
|
- VAE encoder/decoder |
|
|
- Tokenizers |
|
|
|
|
|
Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory. |
|
|
|
|
|
If you find this project helpful, please give us a β on [GitHub](https://github.com/ModelTC/LightX2V) |