Wan2.2-NVFP4-Sparse / README.md
mack-williams's picture
Update README.md
3a80acb verified
|
raw
history blame
5.55 kB
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-T2V-A14B
library_name: diffusers
tags:
- video_generation
- NVFP4
- Sparse_Attention
- Wan
---
# 🎬 Wan2.2-NVFP4-Sparse
> **An extremely efficient Wan 2.2 14B variant: NVFP4 Quantization-Aware Step Distillation with Sparse Attention for Blackwell Architecture**
[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)
## πŸ“‹ Table of Contents
- [✨ Features](#-features)
- [πŸš€ Quick Start](#-quick-start)
- [🎬 Generation Results](#-generation-results)
- [⚑ Performance Comparison](#-performance-comparison)
- [⚠️ Notes](#️-notes)
- [🀝 Community](#-community)
## ✨ Features
- **⚑ 4-Step Inference**: Two high-noise expert steps followed by two low-noise expert steps, enabling extremely fast Wan2.2 MoE generation on a single Blackwell GPU.
- **🎯 NVFP4 Quantization**: Quantization-aware step distillation reduces memory traffic and compute cost while targeting Blackwell architecture.
- **🧩 Sparse Attention**: Accelerates the costly O(n²) self-attention workload with sparse attention, reducing end-to-end latency for high-resolution video generation.
- **πŸ”§ LightX2V Integration**: Recommended runtime stack for stable deployment and best performance.
- **πŸš€ High-Quality Generation**: Preserves the visual quality of Wan2.2-T2V-14B while dramatically improving inference speed.
## πŸš€ Quick Start
We strongly recommend using the official LightX2V Docker image for the cleanest environment and best reproducibility.
### Option A: Docker Recommended
```bash
# 1. Pull LightX2V Docker image
docker pull lightx2v/lightx2v:26052301-cu130-5090
# 2. Run inference
bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
```
### Option B: Manual Installation
If Docker is not available, install the environment manually:
```bash
# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .
# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel
MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
-Cbuild-dir=build . \
-Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
--verbose --color=always --no-build-isolation
pip install dist/*whl --force-reinstall --no-deps
# 3. Run inference
bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
```
Script: [run_wan22_moe_t2v_extreme.sh](https://github.com/ModelTC/LightX2V/blob/main/scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh)
## 🎬 Generation Results
<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
"Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"
</p>
</div>
| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse |
| --- | --- | --- |
| 480p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/WTHhrzx7XR4S1Ys_6Kzx4.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/zorpw7gm9At0J2kCmvkDr.mp4"></video> |
| 720p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/vkiyKj7CJA-r0yTz7TEum.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/TuECbzvW5jI9NHG6GLvIR.mp4"></video> |
## ⚑ Performance Comparison
**Test Environment**: RTX 5090 Single GPU | LightX2V Framework | End-to-End Latency
| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse | Speedup |
| --- | ---: | ---: | ---: |
| 480p | 734s | 14.15s | 51.9x |
| 720p | 2668s | 45s | 59.3x |
## ⚠️ Notes
### System Requirements
- **Required Hardware**: NVIDIA RTX 50-series GPUs or other Blackwell architecture GPUs.
- **Recommended Runtime**: `lightx2v/lightx2v:26052301-cu130-5090`.
### Dependencies
- Prepare Wan2.2 T5 / VAE components following the standard LightX2V Wan2.2 model structure.
- Use Blackwell + NVFP4 kernels for optimal speed and memory efficiency.
### Performance Tips
- Use the provided extreme inference script for the 4-step high-noise / low-noise expert schedule.
- Sparse attention is most beneficial at higher resolutions where self-attention dominates latency.
- Enable CPU offload only when GPU memory is limited, since offload can reduce throughput.
## 🀝 Community
- **πŸ› Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
- **πŸ€— Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
- **πŸ“– Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)
---
<div align="center">
**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**
For questions or issues, please open an issue on [LightX2V](https://github.com/ModelTC/LightX2V/issues) or contact lvchengtao0319@gmail.com.
</div>