File size: 9,089 Bytes
3d6333d 0aecae7 0f7590b 3d6333d 0f7590b 3d6333d d9f7c36 3d6333d f1745fd 9c659ce f1745fd 9c659ce f1745fd d9f7c36 bdcba2e f1745fd bdcba2e 08e0394 3d6333d 0f7590b 3d6333d 0f7590b 3d6333d 0f7590b 3d6333d 0f7590b 3d6333d 0f7590b 3d6333d 0f7590b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
---
license: apache-2.0
tags:
- diffusion-single-file
- comfyui
- distillation
- NVFP4
- video
- video genration
base_model:
- Wan-AI/Wan2.1-I2V-14B-480P
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tags:
- image-to-video
- text-to-video
library_name: diffusers
---
# 🎬 Wan-NVFP4-4Steps Models
> **NVFP4 Quantization-Aware Step Distillation for Blackwell Architecture**
[](https://github.com/ModelTC/LightX2V)
[](https://huggingface.co/lightx2v/)
## 📋 Table of Contents
- [✨ Features](#-features)
- [🚀 Quick Start](#-quick-start)
- [🎬 Generation Results](#-generation-results)
- [⚡ Performance Comparison](#-performance-comparison)
- [📦 Installation](#-installation)
- [🛠️ Usage](#-usage)
- [🧭 Project Structure](#-project-structure)
- [⚠️ Notes](#️-notes)
- [🤝 Community](#-community)
## ✨ Features
- **⚡ 4-Step Inference**: Dramatically accelerated end-to-end generation approaching real-time performance (tested on RTX 5090 single GPU)
- **🎯 NVFP4 Quantization**: Reduced memory and bandwidth usage, optimized for Blackwell architecture
- **🔧 LightX2V Integration**: Optimal performance and stability on the official framework
- **🚀 High-Quality Generation**: Maintains Wan2.1's superior video quality while achieving unprecedented speed
## 🚀 Quick Start
```bash
# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .
# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel
MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
-Cbuild-dir=build . \
-Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
--verbose --color=always --no-build-isolation
pip install dist/*whl --force-reinstall --no-deps
# 3. Run inference
cd examples/wan
python wan_i2v_nvfp4.py # Image-to-Video
python wan_t2v_nvfp4.py # Text-to-Video
```
## 🎬 Generation Results
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
"A cinematic, hyper-realistic 3D animation, in the somber and beautiful style of Sekiro: Shadows Die Twice. In a vast field of silvery-white pampas grass, under a luminous full moon, the shinobi Wolf stands ready for a final duel..."
</p>
</div>
<table style="width: 100%; border-collapse: collapse; margin: 20px 0;">
<tr>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">Input Image</th>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">Wan2.1-I2V-14B-480P</th>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">wan2.1_i2v_480p_nvfp4_lightx2v_4step</th>
</tr>
<tr>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/9lybVJ9QSkbNC4QiP1ygo.png" style="max-width: 200px; height: auto; border-radius: 6px;">
</td>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/jA_3eRiYWjBAif6PDnx_Q.mp4"></video>
</td>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/VJfHDcXEQ7zlixizKFrD7.mp4"></video>
</td>
</tr>
</table>
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #10b981;">
"高对比度,高饱和度,短边构图,日落,中焦距,柔光,背光,暖色调,边缘光,中近景,日光,晴天光,一位外国白人女性的近景,她身穿黄色格子连衣裙,戴着耳环。随着仰拍镜头的上升,女子抬起头来,眼睛里含着泪水,看着前方说着话..."
</p>
</div>
| Wan2.1-T2V-1.3B | wan2.1_t2v_1_3b_nvfp4_lightx2v_4step |
| --- | --- |
| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/dwr0pPbtIe2fHg0hmEM5M.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/cm-S4EaZlCOShlXxOnJ-3.mp4"></video> |
## ⚡ Performance Comparison
**Test Environment**: RTX 5090 Single GPU | LightX2V Framework
<table style="width: 100%; border-collapse: collapse;">
<tr>
<td style="vertical-align: top; padding-right: 20px;">
<h4 style="margin: 0 0 15px 0;">📸 Image-to-Video (I2V-14B-480P)</h4>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Metric</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Original Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Optimized Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Speedup</th>
</tr>
<tr>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><strong>Single-step Denoising</strong></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #64748b; font-weight: bold;">12.10s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #2563eb; font-weight: bold;">3.40s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">3.5x</span></td>
</tr>
<tr>
<td style="padding: 8px;"><strong>End-to-End</strong></td>
<td style="padding: 8px;"><span style="color: #64748b; font-weight: bold;">498.90s</span></td>
<td style="padding: 8px;"><span style="color: #2563eb; font-weight: bold;">17.65s</span></td>
<td style="padding: 8px;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">28x</span></td>
</tr>
</table>
</td>
<td style="vertical-align: top; padding-left: 20px;">
<h4 style="margin: 0 0 15px 0;">🎬 Text-to-Video (T2V-1.3B-480P)</h4>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Metric</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Original Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Optimized Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Speedup</th>
</tr>
<tr>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><strong>Single-step Denoising</strong></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #64748b; font-weight: bold;">2.00s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #2563eb; font-weight: bold;">0.70s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">2.9x</span></td>
</tr>
<tr>
<td style="padding: 8px;"><strong>End-to-End</strong></td>
<td style="padding: 8px;"><span style="color: #64748b; font-weight: bold;">83.50s</span></td>
<td style="padding: 8px;"><span style="color: #2563eb; font-weight: bold;">6.54s</span></td>
<td style="padding: 8px;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">12.8x</span></td>
</tr>
</table>
</td>
</tr>
</table>
## ⚠️ Notes
### System Requirements
- **Required Hardware**: NVIDIA RTX 50-series GPUs (RTX 5090/5080/5070/5060) or other Blackwell architecture GPUs
### Dependencies
- Prepare T5 / CLIP / VAE components yourself (same as Wan2.x structure)
### Performance Tips
- Use Blackwell + NVFP4 for best performance
- Enable CPU offload for GPUs with limited memory
## 🤝 Community
- **🐛 Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
- **🤗 Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
- **📖 Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)
---
<div align="center">
**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**
</div> |