lightx2v
/

Wan2.2-NVFP4-Sparse

 - NVFP4
 - Sparse_Attention
 - Wan
+---
+# 🎬 Wan2.2-NVFP4-Sparse
+> **An extremely efficient Wan 2.2 14B variant: NVFP4 Quantization-Aware Step Distillation with Sparse Attention for Blackwell Architecture**
+[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
+[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)
+## 📋 Table of Contents
+- [✨ Features](#-features)
+- [🚀 Quick Start](#-quick-start)
+- [🎬 Generation Results](#-generation-results)
+- [⚡ Performance Comparison](#-performance-comparison)
+- [⚠️ Notes](#️-notes)
+- [🤝 Community](#-community)
+## ✨ Features
+- **⚡ 4-Step Inference**: Two high-noise expert steps followed by two low-noise expert steps, enabling extremely fast Wan2.2 MoE generation on a single Blackwell GPU.
+- **🎯 NVFP4 Quantization**: Quantization-aware step distillation reduces memory traffic and compute cost while targeting Blackwell architecture.
+- **🧩 Sparse Attention**: Accelerates the costly O(n²) self-attention workload with sparse attention, reducing end-to-end latency for high-resolution video generation.
+- **🔧 LightX2V Integration**: Recommended runtime stack for stable deployment and best performance.
+- **🚀 High-Quality Generation**: Preserves the visual quality of Wan2.2-T2V-14B while dramatically improving inference speed.
+## 🚀 Quick Start
+We strongly recommend using the official LightX2V Docker image for the cleanest environment and best reproducibility.
+### Option A: Docker Recommended
+```bash
+# 1. Pull LightX2V Docker image
+docker pull lightx2v/lightx2v:26052301-cu130-5090
+# 2. Run inference
+bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
+```
+### Option B: Manual Installation
+If Docker is not available, install the environment manually:
+```bash
+# 1. Install LightX2V
+git clone https://github.com/ModelTC/LightX2V.git
+cd LightX2V
+uv pip install -v .
+# 2. Install NVFP4 Kernel
+pip install scikit_build_core uv
+git clone https://github.com/NVIDIA/cutlass.git
+cd lightx2v_kernel
+MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
+uv build --wheel \
+  -Cbuild-dir=build . \
+  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
+  --verbose --color=always --no-build-isolation
+pip install dist/*whl --force-reinstall --no-deps
+# 3. Run inference
+bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
+```
+Script: [run_wan22_moe_t2v_extreme.sh](https://github.com/ModelTC/LightX2V/blob/main/scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh)
+## 🎬 Generation Results
+<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
+<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
+"Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"
+</p>
+</div>
+| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse |
+| --- | --- | --- |
+| 480p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/WTHhrzx7XR4S1Ys_6Kzx4.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/zorpw7gm9At0J2kCmvkDr.mp4"></video> |
+| 720p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/vkiyKj7CJA-r0yTz7TEum.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/TuECbzvW5jI9NHG6GLvIR.mp4"></video> |
+## ⚡ Performance Comparison
+**Test Environment**: RTX 5090 Single GPU | LightX2V Framework | End-to-End Latency
+| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse | Speedup |
+| --- | ---: | ---: | ---: |
+| 480p | 734s | 14.15s | 51.9x |
+| 720p | 2668s | 45s | 59.3x |
+## ⚠️ Notes
+### System Requirements
+- **Required Hardware**: NVIDIA RTX 50-series GPUs or other Blackwell architecture GPUs.
+- **Recommended Runtime**: `lightx2v/lightx2v:26052301-cu130-5090`.
+### Dependencies
+- Prepare Wan2.2 T5 / VAE components following the standard LightX2V Wan2.2 model structure.
+- Use Blackwell + NVFP4 kernels for optimal speed and memory efficiency.
+### Performance Tips
+- Use the provided extreme inference script for the 4-step high-noise / low-noise expert schedule.
+- Sparse attention is most beneficial at higher resolutions where self-attention dominates latency.
+- Enable CPU offload only when GPU memory is limited, since offload can reduce throughput.
+## 🤝 Community
+- **🐛 Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
+- **🤗 Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
+- **📖 Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)
+---
+<div align="center">
+**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**
+For questions or issues, please open an issue on [LightX2V](https://github.com/ModelTC/LightX2V/issues) or contact lvchengtao0319@gmail.com.
+</div>