File size: 9,089 Bytes
3d6333d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0aecae7
 
0f7590b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d6333d
 
0f7590b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d6333d
 
 
d9f7c36
 
 
 
 
3d6333d
f1745fd
 
 
 
 
 
 
 
 
 
 
9c659ce
f1745fd
 
9c659ce
f1745fd
 
 
 
d9f7c36
 
 
 
 
bdcba2e
f1745fd
bdcba2e
08e0394
3d6333d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f7590b
3d6333d
0f7590b
 
3d6333d
0f7590b
 
3d6333d
0f7590b
 
 
3d6333d
 
 
0f7590b
 
 
 
 
 
 
 
 
3d6333d
0f7590b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
license: apache-2.0
tags:
- diffusion-single-file
- comfyui
- distillation
- NVFP4
- video
- video genration
base_model:
- Wan-AI/Wan2.1-I2V-14B-480P
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tags:
- image-to-video
- text-to-video
library_name: diffusers
---
# 🎬 Wan-NVFP4-4Steps Models

> **NVFP4 Quantization-Aware Step Distillation for Blackwell Architecture**

[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)

## 📋 Table of Contents

- [✨ Features](#-features)
- [🚀 Quick Start](#-quick-start)
- [🎬 Generation Results](#-generation-results)
- [⚡ Performance Comparison](#-performance-comparison)
- [📦 Installation](#-installation)
- [🛠️ Usage](#-usage)
- [🧭 Project Structure](#-project-structure)
- [⚠️ Notes](#️-notes)
- [🤝 Community](#-community)

## ✨ Features

- **⚡ 4-Step Inference**: Dramatically accelerated end-to-end generation approaching real-time performance (tested on RTX 5090 single GPU)
- **🎯 NVFP4 Quantization**: Reduced memory and bandwidth usage, optimized for Blackwell architecture
- **🔧 LightX2V Integration**: Optimal performance and stability on the official framework
- **🚀 High-Quality Generation**: Maintains Wan2.1's superior video quality while achieving unprecedented speed

## 🚀 Quick Start

```bash
# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .

# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel

MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
  -Cbuild-dir=build . \
  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
  --verbose --color=always --no-build-isolation

pip install dist/*whl --force-reinstall --no-deps

# 3. Run inference
cd examples/wan
python wan_i2v_nvfp4.py   # Image-to-Video
python wan_t2v_nvfp4.py   # Text-to-Video
```

## 🎬 Generation Results

<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
"A cinematic, hyper-realistic 3D animation, in the somber and beautiful style of Sekiro: Shadows Die Twice. In a vast field of silvery-white pampas grass, under a luminous full moon, the shinobi Wolf stands ready for a final duel..."
</p>
</div>

<table style="width: 100%; border-collapse: collapse; margin: 20px 0;">
<tr>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">Input Image</th>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">Wan2.1-I2V-14B-480P</th>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">wan2.1_i2v_480p_nvfp4_lightx2v_4step</th>
</tr>
<tr>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/9lybVJ9QSkbNC4QiP1ygo.png" style="max-width: 200px; height: auto; border-radius: 6px;">
</td>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/jA_3eRiYWjBAif6PDnx_Q.mp4"></video>
</td>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/VJfHDcXEQ7zlixizKFrD7.mp4"></video>
</td>
</tr>
</table>

<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #10b981;">
"高对比度,高饱和度,短边构图,日落,中焦距,柔光,背光,暖色调,边缘光,中近景,日光,晴天光,一位外国白人女性的近景,她身穿黄色格子连衣裙,戴着耳环。随着仰拍镜头的上升,女子抬起头来,眼睛里含着泪水,看着前方说着话..."
</p>
</div>

| Wan2.1-T2V-1.3B | wan2.1_t2v_1_3b_nvfp4_lightx2v_4step |
| --- | --- |
| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/dwr0pPbtIe2fHg0hmEM5M.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/cm-S4EaZlCOShlXxOnJ-3.mp4"></video> |

## ⚡ Performance Comparison

**Test Environment**: RTX 5090 Single GPU | LightX2V Framework

<table style="width: 100%; border-collapse: collapse;">
<tr>
<td style="vertical-align: top; padding-right: 20px;">
<h4 style="margin: 0 0 15px 0;">📸 Image-to-Video (I2V-14B-480P)</h4>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Metric</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Original Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Optimized Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Speedup</th>
</tr>
<tr>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><strong>Single-step Denoising</strong></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #64748b; font-weight: bold;">12.10s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #2563eb; font-weight: bold;">3.40s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">3.5x</span></td>
</tr>
<tr>
<td style="padding: 8px;"><strong>End-to-End</strong></td>
<td style="padding: 8px;"><span style="color: #64748b; font-weight: bold;">498.90s</span></td>
<td style="padding: 8px;"><span style="color: #2563eb; font-weight: bold;">17.65s</span></td>
<td style="padding: 8px;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">28x</span></td>
</tr>
</table>
</td>
<td style="vertical-align: top; padding-left: 20px;">
<h4 style="margin: 0 0 15px 0;">🎬 Text-to-Video (T2V-1.3B-480P)</h4>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Metric</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Original Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Optimized Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Speedup</th>
</tr>
<tr>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><strong>Single-step Denoising</strong></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #64748b; font-weight: bold;">2.00s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #2563eb; font-weight: bold;">0.70s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">2.9x</span></td>
</tr>
<tr>
<td style="padding: 8px;"><strong>End-to-End</strong></td>
<td style="padding: 8px;"><span style="color: #64748b; font-weight: bold;">83.50s</span></td>
<td style="padding: 8px;"><span style="color: #2563eb; font-weight: bold;">6.54s</span></td>
<td style="padding: 8px;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">12.8x</span></td>
</tr>
</table>
</td>
</tr>
</table>

## ⚠️ Notes

### System Requirements
- **Required Hardware**: NVIDIA RTX 50-series GPUs (RTX 5090/5080/5070/5060) or other Blackwell architecture GPUs

### Dependencies
- Prepare T5 / CLIP / VAE components yourself (same as Wan2.x structure)

### Performance Tips
- Use Blackwell + NVFP4 for best performance
- Enable CPU offload for GPUs with limited memory

## 🤝 Community

- **🐛 Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
- **🤗 Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
- **📖 Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)

---

<div align="center">

**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**

</div>