| --- |
| license: apache-2.0 |
| tags: |
| - diffusion-single-file |
| - comfyui |
| - distillation |
| - lora |
| - video |
| - video genration |
| base_model: |
| - Wan-AI/Wan2.1-T2V-14B |
| - Wan-AI/Wan2.1-I2V-14B-480P |
| - Wan-AI/Wan2.1-I2V-14B-720P |
| library_name: diffusers |
| --- |
| <div align="center"> |
|
|
| # π¬ Wan2.1 Distilled Models |
|
|
| ### β‘ High-Performance Video Generation with 4-Step Inference |
|
|
| *Distillation-accelerated versions of Wan2.1 - Dramatically faster while maintaining exceptional quality* |
|
|
|  |
|
|
| --- |
|
|
| [](https://huggingface.co/lightx2v/Wan2.1-Distill-Models) |
| [](https://github.com/ModelTC/LightX2V) |
| [](LICENSE) |
|
|
| </div> |
|
|
| --- |
|
|
| ## π What's Special? |
|
|
| <table> |
| <tr> |
| <td width="50%"> |
|
|
| ### β‘ Ultra-Fast Generation |
| - **4-step inference** (vs traditional 50+ steps) |
| - Up to **2x faster** than ComfyUI |
| - Real-time video generation capability |
|
|
| </td> |
| <td width="50%"> |
|
|
| ### π― Flexible Options |
| - Multiple resolutions (480P/720P) |
| - Various precision formats (BF16/FP8/INT8) |
| - I2V and T2V support |
|
|
| </td> |
| </tr> |
| <tr> |
| <td width="50%"> |
|
|
| ### πΎ Memory Efficient |
| - FP8/INT8: **~50% size reduction** |
| - CPU offload support |
| - Optimized for consumer GPUs |
|
|
| </td> |
| <td width="50%"> |
|
|
| ### π§ Easy Integration |
| - Compatible with LightX2V framework |
| - ComfyUI support available |
| - Simple configuration files |
|
|
| </td> |
| </tr> |
| </table> |
|
|
| --- |
|
|
| ## π¦ Model Catalog |
|
|
| ### π₯ Model Types |
|
|
| <table> |
| <tr> |
| <td align="center" width="50%"> |
|
|
| #### πΌοΈ **Image-to-Video (I2V)** |
| Transform still images into dynamic videos |
| - πΊ 480P Resolution |
| - π¬ 720P Resolution |
|
|
| </td> |
| <td align="center" width="50%"> |
|
|
| #### π **Text-to-Video (T2V)** |
| Generate videos from text descriptions |
| - π 14B Parameters |
| - π¨ High-quality synthesis |
|
|
| </td> |
| </tr> |
| </table> |
|
|
| ### π― Precision Variants |
|
|
| | Precision | Model Identifier | Model Size | Framework | Quality vs Speed | |
| |:---------:|:-----------------|:----------:|:---------:|:-----------------| |
| | π **BF16** | `lightx2v_4step` | ~28-32 GB | LightX2V | βββββ Highest quality | |
| | β‘ **FP8** | `scaled_fp8_e4m3_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Excellent balance | |
| | π― **INT8** | `int8_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Fast & efficient | |
| | π· **FP8 ComfyUI** | `scaled_fp8_e4m3_lightx2v_4step_comfyui` | ~15-17 GB | ComfyUI | βββ ComfyUI ready | |
|
|
| ### π Naming Convention |
|
|
| ```bash |
| # Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors |
| |
| # Examples: |
| wan2.1_i2v_720p_lightx2v_4step.safetensors # 720P I2V - BF16 |
| wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # 720P I2V - FP8 |
| wan2.1_i2v_480p_int8_lightx2v_4step.safetensors # 480P I2V - INT8 |
| wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors # T2V - FP8 ComfyUI |
| ``` |
|
|
| > π‘ **Explore all models**: [Browse Full Model Collection β](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main) |
|
|
| ## π Usage |
|
|
| **LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!** |
|
|
| #### Quick Start |
|
|
| 1. Download model (720P I2V FP8 example) |
| ```bash |
| huggingface-cli download lightx2v/Wan2.1-Distill-Models \ |
| --local-dir ./models/wan2.1_i2v_720p \ |
| --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors" |
| ``` |
|
|
| 2. Clone LightX2V repository |
|
|
| ```bash |
| git clone https://github.com/ModelTC/LightX2V.git |
| cd LightX2V |
| ``` |
|
|
| 3. Install dependencies |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
| Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker |
|
|
| 4. Select and modify configuration file |
|
|
| Choose the appropriate configuration based on your GPU memory: |
|
|
| **For 80GB+ GPU (A100/H100)** |
| - I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json) |
| - T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json) |
|
|
| **For 24GB+ GPU (RTX 4090)** |
| - I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json) |
| - T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json) |
|
|
|
|
| 5. Run inference |
| ``` |
| cd scripts |
| bash wan/run_wan_i2v_distill_4step_cfg.sh |
| ``` |
|
|
| #### Documentation |
| - **Quick Start Guide**: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) |
| - **Complete Usage Guide**: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) |
| - **Configuration Guide**: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill) |
| - **Quantization Usage**: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md) |
| - **Parameter Offload**: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md) |
|
|
|
|
| #### Performance Advantages |
|
|
| - β‘ **Fast**: Approximately **2x faster** than ComfyUI |
| - π― **Optimized**: Deeply optimized for distilled models |
| - πΎ **Memory Efficient**: Supports CPU offload and other memory optimization techniques |
| - π οΈ **Flexible**: Supports multiple quantization formats and configuration options |
|
|
|
|
| ### Community |
| - **Issues**: https://github.com/ModelTC/LightX2V/issues |
|
|
| ## β οΈ Important Notes |
|
|
| 1. **Additional Components**: These models only contain DIT weights. You also need: |
| - T5 text encoder |
| - CLIP vision encoder |
| - VAE encoder/decoder |
| - Tokenizers |
|
|
| Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory. |
|
|
| If you find this project helpful, please give us a β on [GitHub](https://github.com/ModelTC/LightX2V) |