File size: 9,089 Bytes

---
license: apache-2.0
tags:
- diffusion-single-file
- comfyui
- distillation
- NVFP4
- video
- video genration
base_model:
- Wan-AI/Wan2.1-I2V-14B-480P
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tags:
- image-to-video
- text-to-video
library_name: diffusers
---
# 🎬 Wan-NVFP4-4Steps Models

> **NVFP4 Quantization-Aware Step Distillation for Blackwell Architecture**

[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)

## 📋 Table of Contents

- [✨ Features](#-features)
- [🚀 Quick Start](#-quick-start)
- [🎬 Generation Results](#-generation-results)
- [⚡ Performance Comparison](#-performance-comparison)
- [📦 Installation](#-installation)
- [🛠️ Usage](#-usage)
- [🧭 Project Structure](#-project-structure)
- [⚠️ Notes](#️-notes)
- [🤝 Community](#-community)

## ✨ Features

- **⚡ 4-Step Inference**: Dramatically accelerated end-to-end generation approaching real-time performance (tested on RTX 5090 single GPU)
- **🎯 NVFP4 Quantization**: Reduced memory and bandwidth usage, optimized for Blackwell architecture
- **🔧 LightX2V Integration**: Optimal performance and stability on the official framework
- **🚀 High-Quality Generation**: Maintains Wan2.1's superior video quality while achieving unprecedented speed

## 🚀 Quick Start

```bash
# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .

# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel

MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
  -Cbuild-dir=build . \
  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
  --verbose --color=always --no-build-isolation

pip install dist/*whl --force-reinstall --no-deps

# 3. Run inference
cd examples/wan
python wan_i2v_nvfp4.py   # Image-to-Video
python wan_t2v_nvfp4.py   # Text-to-Video
```

## 🎬 Generation Results

<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
"A cinematic, hyper-realistic 3D animation, in the somber and beautiful style of Sekiro: Shadows Die Twice. In a vast field of silvery-white pampas grass, under a luminous full moon, the shinobi Wolf stands ready for a final duel..."
</p>
</div>

<table style="width: 100%; border-collapse: collapse; margin: 20px 0;">
<tr>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">Input Image</th>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">Wan2.1-I2V-14B-480P</th>
<th style="text-align: center; padding: 12px; background: #f1f5f9; border: 1px solid #e2e8f0; font-weight: 600;">wan2.1_i2v_480p_nvfp4_lightx2v_4step</th>
</tr>
<tr>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/9lybVJ9QSkbNC4QiP1ygo.png" style="max-width: 200px; height: auto; border-radius: 6px;">
</td>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/jA_3eRiYWjBAif6PDnx_Q.mp4"></video>
</td>
<td style="text-align: center; padding: 12px; border: 1px solid #e2e8f0;">
<video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/VJfHDcXEQ7zlixizKFrD7.mp4"></video>
</td>
</tr>
</table>

<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #10b981;">
"高对比度，高饱和度，短边构图，日落，中焦距，柔光，背光，暖色调，边缘光，中近景，日光，晴天光，一位外国白人女性的近景，她身穿黄色格子连衣裙，戴着耳环。随着仰拍镜头的上升，女子抬起头来，眼睛里含着泪水，看着前方说着话..."
</p>
</div>

| Wan2.1-T2V-1.3B | wan2.1_t2v_1_3b_nvfp4_lightx2v_4step |
| --- | --- |
| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/dwr0pPbtIe2fHg0hmEM5M.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/cm-S4EaZlCOShlXxOnJ-3.mp4"></video> |

## ⚡ Performance Comparison

**Test Environment**: RTX 5090 Single GPU | LightX2V Framework

<table style="width: 100%; border-collapse: collapse;">
<tr>
<td style="vertical-align: top; padding-right: 20px;">
<h4 style="margin: 0 0 15px 0;">📸 Image-to-Video (I2V-14B-480P)</h4>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Metric</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Original Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Optimized Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Speedup</th>
</tr>
<tr>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><strong>Single-step Denoising</strong></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #64748b; font-weight: bold;">12.10s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #2563eb; font-weight: bold;">3.40s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">3.5x</span></td>
</tr>
<tr>
<td style="padding: 8px;"><strong>End-to-End</strong></td>
<td style="padding: 8px;"><span style="color: #64748b; font-weight: bold;">498.90s</span></td>
<td style="padding: 8px;"><span style="color: #2563eb; font-weight: bold;">17.65s</span></td>
<td style="padding: 8px;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">28x</span></td>
</tr>
</table>
</td>
<td style="vertical-align: top; padding-left: 20px;">
<h4 style="margin: 0 0 15px 0;">🎬 Text-to-Video (T2V-1.3B-480P)</h4>
<table style="width: 100%; border-collapse: collapse;">
<tr>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Metric</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Original Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Optimized Model</th>
<th style="text-align: left; padding: 8px; border-bottom: 2px solid #e2e8f0;">Speedup</th>
</tr>
<tr>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><strong>Single-step Denoising</strong></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #64748b; font-weight: bold;">2.00s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="color: #2563eb; font-weight: bold;">0.70s</span></td>
<td style="padding: 8px; border-bottom: 1px solid #f1f5f9;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">2.9x</span></td>
</tr>
<tr>
<td style="padding: 8px;"><strong>End-to-End</strong></td>
<td style="padding: 8px;"><span style="color: #64748b; font-weight: bold;">83.50s</span></td>
<td style="padding: 8px;"><span style="color: #2563eb; font-weight: bold;">6.54s</span></td>
<td style="padding: 8px;"><span style="background: #16a34a; color: white; padding: 4px 8px; border-radius: 12px; font-weight: bold;">12.8x</span></td>
</tr>
</table>
</td>
</tr>
</table>

## ⚠️ Notes

### System Requirements
- **Required Hardware**: NVIDIA RTX 50-series GPUs (RTX 5090/5080/5070/5060) or other Blackwell architecture GPUs

### Dependencies
- Prepare T5 / CLIP / VAE components yourself (same as Wan2.x structure)

### Performance Tips
- Use Blackwell + NVFP4 for best performance
- Enable CPU offload for GPUs with limited memory

## 🤝 Community

- **🐛 Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
- **🤗 Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
- **📖 Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)

---

<div align="center">

**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**

</div>