Wan-NVFP4 / README.md

lightx2v

Update README.md

08e0394 verified 15 days ago

preview code

raw

history blame contribute delete

9.09 kB

metadata

license: apache-2.0
tags:
  - diffusion-single-file
  - comfyui
  - distillation
  - NVFP4
  - video
  - video genration
base_model:
  - Wan-AI/Wan2.1-I2V-14B-480P
  - Wan-AI/Wan2.1-T2V-1.3B
pipeline_tags:
  - image-to-video
  - text-to-video
library_name: diffusers

🎬 Wan-NVFP4-4Steps Models

NVFP4 Quantization-Aware Step Distillation for Blackwell Architecture

📋 Table of Contents

✨ Features
🚀 Quick Start
🎬 Generation Results
⚡ Performance Comparison
📦 Installation
🛠️ Usage
🧭 Project Structure
⚠️ Notes
🤝 Community

✨ Features

⚡ 4-Step Inference: Dramatically accelerated end-to-end generation approaching real-time performance (tested on RTX 5090 single GPU)
🎯 NVFP4 Quantization: Reduced memory and bandwidth usage, optimized for Blackwell architecture
🔧 LightX2V Integration: Optimal performance and stability on the official framework
🚀 High-Quality Generation: Maintains Wan2.1's superior video quality while achieving unprecedented speed

🚀 Quick Start

# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .

# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel

MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
  -Cbuild-dir=build . \
  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
  --verbose --color=always --no-build-isolation

pip install dist/*whl --force-reinstall --no-deps

# 3. Run inference
cd examples/wan
python wan_i2v_nvfp4.py   # Image-to-Video
python wan_t2v_nvfp4.py   # Text-to-Video

🎬 Generation Results

"A cinematic, hyper-realistic 3D animation, in the somber and beautiful style of Sekiro: Shadows Die Twice. In a vast field of silvery-white pampas grass, under a luminous full moon, the shinobi Wolf stands ready for a final duel..."

Input Image	Wan2.1-I2V-14B-480P	wan2.1_i2v_480p_nvfp4_lightx2v_4step

"高对比度，高饱和度，短边构图，日落，中焦距，柔光，背光，暖色调，边缘光，中近景，日光，晴天光，一位外国白人女性的近景，她身穿黄色格子连衣裙，戴着耳环。随着仰拍镜头的上升，女子抬起头来，眼睛里含着泪水，看着前方说着话..."

Wan2.1-T2V-1.3B	wan2.1_t2v_1_3b_nvfp4_lightx2v_4step

⚡ Performance Comparison

Test Environment: RTX 5090 Single GPU | LightX2V Framework

📸 Image-to-Video (I2V-14B-480P)

Metric	Original Model	Optimized Model	Speedup
Single-step Denoising	12.10s	3.40s	3.5x
End-to-End	498.90s	17.65s	28x

🎬 Text-to-Video (T2V-1.3B-480P)

Metric	Original Model	Optimized Model	Speedup
Single-step Denoising	2.00s	0.70s	2.9x
End-to-End	83.50s	6.54s	12.8x

⚠️ Notes

System Requirements

Required Hardware: NVIDIA RTX 50-series GPUs (RTX 5090/5080/5070/5060) or other Blackwell architecture GPUs

Dependencies

Prepare T5 / CLIP / VAE components yourself (same as Wan2.x structure)

Performance Tips

Use Blackwell + NVFP4 for best performance
Enable CPU offload for GPUs with limited memory

🤝 Community

🐛 Issues: GitHub Issues
🤗 Models: HuggingFace Hub
📖 Documentation: LightX2V Docs

If you find this project helpful, please give us a ⭐ on GitHub