Instructions to use lightx2v/Wan2.1-Distill-Models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use lightx2v/Wan2.1-Distill-Models with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-14B,Wan-AI/Wan2.1-I2V-14B-480P,Wan-AI/Wan2.1-I2V-14B-720P", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("lightx2v/Wan2.1-Distill-Models") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Diffusion Single File
How to use lightx2v/Wan2.1-Distill-Models with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
| base_model: | |
| - Wan-AI/Wan2.1-T2V-14B | |
| - Wan-AI/Wan2.1-I2V-14B-480P | |
| - Wan-AI/Wan2.1-I2V-14B-720P | |
| library_name: diffusers | |
| license: apache-2.0 | |
| pipeline_tag: text-to-video | |
| tags: | |
| - diffusion-single-file | |
| - comfyui | |
| - distillation | |
| - lora | |
| - video | |
| - video generation | |
| <div align="center"> | |
| # π¬ Wan2.1 Distilled Models | |
| ### β‘ High-Performance Video Generation with 4-Step Inference | |
| *Distillation-accelerated versions of Wan2.1 based on the paper [SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation](https://huggingface.co/papers/2605.30116) - Dramatically faster while maintaining exceptional quality* | |
|  | |
| --- | |
| [](https://huggingface.co/lightx2v/Wan2.1-Distill-Models) | |
| [](https://github.com/ModelTC/LightX2V) | |
| [](LICENSE) | |
| </div> | |
| --- | |
| ## π What's Special? | |
| <table> | |
| <tr> | |
| <td width="50%"> | |
| ### β‘ Ultra-Fast Generation | |
| - **4-step inference** (vs traditional 50+ steps) | |
| - Up to **2x faster** than ComfyUI | |
| - Real-time video generation capability | |
| </td> | |
| <td width="50%"> | |
| ### π― Flexible Options | |
| - Multiple resolutions (480P/720P) | |
| - Various precision formats (BF16/FP8/INT8) | |
| - I2V and T2V support | |
| </td> | |
| </tr> | |
| <tr> | |
| <td width="50%"> | |
| ### πΎ Memory Efficient | |
| - FP8/INT8: **~50% size reduction** | |
| - CPU offload support | |
| - Optimized for consumer GPUs | |
| </td> | |
| <td width="50%"> | |
| ### π§ Easy Integration | |
| - Compatible with LightX2V framework | |
| - ComfyUI support available | |
| - Simple configuration files | |
| </td> | |
| </tr> | |
| </table> | |
| --- | |
| ## π¦ Model Catalog | |
| ### π₯ Model Types | |
| <table> | |
| <tr> | |
| <td align="center" width="50%"> | |
| #### πΌοΈ **Image-to-Video (I2V)** | |
| Transform still images into dynamic videos | |
| - πΊ 480P Resolution | |
| - π¬ 720P Resolution | |
| </td> | |
| <td align="center" width="50%"> | |
| #### π **Text-to-Video (T2V)** | |
| Generate videos from text descriptions | |
| - π 14B Parameters | |
| - π¨ High-quality synthesis | |
| </td> | |
| </tr> | |
| </table> | |
| ### π― Precision Variants | |
| | Precision | Model Identifier | Model Size | Framework | Quality vs Speed | | |
| |:---------:|:-----------------|:----------:|:---------:|:-----------------| | |
| | π **BF16** | `lightx2v_4step` | ~28-32 GB | LightX2V | βββββ Highest quality | | |
| | β‘ **FP8** | `scaled_fp8_e4m3_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Excellent balance | | |
| | π― **INT8** | `int8_lightx2v_4step` | ~15-17 GB | LightX2V | ββββ Fast & efficient | | |
| | π· **FP8 ComfyUI** | `scaled_fp8_e4m3_lightx2v_4step_comfyui` | ~15-17 GB | ComfyUI | βββ ComfyUI ready | | |
| ### π Naming Convention | |
| ```bash | |
| # Pattern: wan2.1_{task}_{resolution}_{precision}.safetensors | |
| # Examples: | |
| wan2.1_i2v_720p_lightx2v_4step.safetensors # 720P I2V - BF16 | |
| wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors # 720P I2V - FP8 | |
| wan2.1_i2v_480p_int8_lightx2v_4step.safetensors # 480P I2V - INT8 | |
| wan2.1_t2v_14b_scaled_fp8_e4m3_lightx2v_4step_comfyui.safetensors # T2V - FP8 ComfyUI | |
| ``` | |
| > π‘ **Explore all models**: [Browse Full Model Collection β](https://huggingface.co/lightx2v/Wan2.1-Distill-Models/tree/main) | |
| ## π Usage | |
| **LightX2V is a high-performance inference framework optimized for these models, approximately 2x faster than ComfyUI with better quantization accuracy. Highly recommended!** | |
| ### Python Sample Usage | |
| ```python | |
| from lightx2v import LightX2VPipeline | |
| # Initialize pipeline for Wan2.1 I2V task | |
| pipe = LightX2VPipeline( | |
| model_path="lightx2v/Wan2.1-Distill-Models", | |
| model_cls="wan2.1", | |
| task="i2v", | |
| ) | |
| # Enable offloading to reduce VRAM usage (suitable for consumer GPUs) | |
| pipe.enable_offload( | |
| cpu_offload=True, | |
| offload_granularity="block", | |
| text_encoder_offload=True, | |
| image_encoder_offload=False, | |
| vae_offload=False, | |
| ) | |
| # Create generator with 4-step distilled inference | |
| pipe.create_generator( | |
| attn_mode="sage_attn2", | |
| infer_steps=4, | |
| height=480, | |
| width=832, | |
| num_frames=81, | |
| guidance_scale=5.0, | |
| sample_shift=5.0, | |
| ) | |
| # Generate video | |
| pipe.generate( | |
| seed=42, | |
| image_path="path/to/image.jpg", | |
| prompt="Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard.", | |
| negative_prompt="shaking camera, low quality, static", | |
| save_result_path="output.mp4", | |
| ) | |
| ``` | |
| #### Quick Start (CLI) | |
| 1. Download model (720P I2V FP8 example) | |
| ```bash | |
| huggingface-cli download lightx2v/Wan2.1-Distill-Models \ | |
| --local-dir ./models/wan2.1_i2v_720p \ | |
| --include "wan2.1_i2v_720p_scaled_fp8_e4m3_lightx2v_4step.safetensors" | |
| ``` | |
| 2. Clone LightX2V repository | |
| ```bash | |
| git clone https://github.com/ModelTC/LightX2V.git | |
| cd LightX2V | |
| ``` | |
| 3. Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Or refer to [Quick Start Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) to use docker | |
| 4. Select and modify configuration file | |
| Choose the appropriate configuration based on your GPU memory: | |
| **For 80GB+ GPU (A100/H100)** | |
| - I2V: [wan_i2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg.json) | |
| - T2V: [wan_t2v_distill_4step_cfg.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg.json) | |
| **For 24GB+ GPU (RTX 4090)** | |
| - I2V: [wan_i2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_i2v_distill_4step_cfg_4090.json) | |
| - T2V: [wan_t2v_distill_4step_cfg_4090.json](https://github.com/ModelTC/LightX2V/blob/main/configs/distill/wan_t2v_distill_4step_cfg_4090.json) | |
| 5. Run inference | |
| ```bash | |
| cd scripts | |
| bash wan/run_wan_i2v_distill_4step_cfg.sh | |
| ``` | |
| #### Documentation | |
| - **Quick Start Guide**: [LightX2V Quick Start](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/quickstart.md) | |
| - **Complete Usage Guide**: [LightX2V Model Structure Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) | |
| - **Configuration Guide**: [Configuration Files](https://github.com/ModelTC/LightX2V/tree/main/configs/distill) | |
| - **Quantization Usage**: [Quantization Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/quantization.md) | |
| - **Parameter Offload**: [Offload Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/method_tutorials/offload.md) | |
| #### Performance Advantages | |
| - β‘ **Fast**: Approximately **2x faster** than ComfyUI | |
| - π― **Optimized**: Deeply optimized for distilled models | |
| - πΎ **Memory Efficient**: Supports CPU offload and other memory optimization techniques | |
| - π οΈ **Flexible**: Supports multiple quantization formats and configuration options | |
| ### Community | |
| - **Issues**: https://github.com/ModelTC/LightX2V/issues | |
| ## β οΈ Important Notes | |
| 1. **Additional Components**: These models only contain DIT weights. You also need: | |
| - T5 text encoder | |
| - CLIP vision encoder | |
| - VAE encoder/decoder | |
| - Tokenizers | |
| Refer to [LightX2V Documentation](https://github.com/ModelTC/LightX2V/blob/main/docs/EN/source/getting_started/model_structure.md) for how to organize the complete model directory. | |
| If you find this project helpful, please give us a β on [GitHub](https://github.com/ModelTC/LightX2V) |