Update README.md

a6f09bc verified 12 days ago

5.55 kB

	---
	license: apache-2.0
	base_model:
	- Wan-AI/Wan2.2-T2V-A14B
	library_name: diffusers
	tags:
	- video_generation
	- NVFP4
	- Sparse_Attention
	- Wan
	---
	# 🎬 Wan2.2-NVFP4-Sparse

	> An extremely efficient Wan 2.2 14B variant: NVFP4 Quantization-Aware Step Distillation with Sparse Attention for Blackwell Architecture

	[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
	[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)

	## 📋 Table of Contents

	- [✨ Features](#-features)
	- [🚀 Quick Start](#-quick-start)
	- [🎬 Generation Results](#-generation-results)
	- [⚡ Performance Comparison](#-performance-comparison)
	- [⚠️ Notes](#️-notes)
	- [🤝 Community](#-community)

	## ✨ Features

	- ⚡ 4-Step Inference: Two high-noise expert steps followed by two low-noise expert steps, enabling extremely fast Wan2.2 MoE generation on a single Blackwell GPU.
	- 🎯 NVFP4 Quantization: Quantization-aware step distillation reduces memory traffic and compute cost while targeting Blackwell architecture.
	- 🧩 Sparse Attention: Accelerates the costly O(n²) self-attention workload with sparse attention, reducing end-to-end latency for high-resolution video generation.
	- 🔧 LightX2V Integration: Recommended runtime stack for stable deployment and best performance.
	- 🚀 High-Quality Generation: Preserves the visual quality of Wan2.2-T2V-14B while dramatically improving inference speed.

	## 🚀 Quick Start

	We strongly recommend using the official LightX2V Docker image for the cleanest environment and best reproducibility.

	### Option A: Docker Recommended

	```bash
	# 1. Pull LightX2V Docker image
	docker pull lightx2v/lightx2v:26052801-cu130-5090

	# 2. Run inference
	bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
	```

	### Option B: Manual Installation

	If Docker is not available, install the environment manually:

	```bash
	# 1. Install LightX2V
	git clone https://github.com/ModelTC/LightX2V.git
	cd LightX2V
	uv pip install -v .

	# 2. Install NVFP4 Kernel
	pip install scikit_build_core uv
	git clone https://github.com/NVIDIA/cutlass.git
	cd lightx2v_kernel

	MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
	uv build --wheel \
	-Cbuild-dir=build . \
	-Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
	--verbose --color=always --no-build-isolation

	pip install dist/*whl --force-reinstall --no-deps

	# 3. Run inference
	bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
	```

	Script: [run_wan22_moe_t2v_extreme.sh](https://github.com/ModelTC/LightX2V/blob/main/scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh)

	## 🎬 Generation Results

	<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
	<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
	"Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"
	</p>
	</div>


	\| Resolution \| Wan2.2-T2V-14B \| Wan2.2-NVFP4-Sparse \|
	\| --- \| --- \| --- \|
	\| 480p \| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/WTHhrzx7XR4S1Ys_6Kzx4.mp4"></video> \| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/zorpw7gm9At0J2kCmvkDr.mp4"></video> \|
	\| 720p \| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/vkiyKj7CJA-r0yTz7TEum.mp4"></video> \| <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/TuECbzvW5jI9NHG6GLvIR.mp4"></video> \|


	## ⚡ Performance Comparison

	Test Environment: RTX 5090 Single GPU \| LightX2V Framework \| End-to-End Latency

	\| Resolution \| Wan2.2-T2V-14B \| Wan2.2-NVFP4-Sparse \| Speedup \|
	\| --- \| ---: \| ---: \| ---: \|
	\| 480p \| 734s \| 14.15s \| 51.9x \|
	\| 720p \| 2668s \| 45s \| 59.3x \|

	## ⚠️ Notes

	### System Requirements

	- Required Hardware: NVIDIA RTX 50-series GPUs or other Blackwell architecture GPUs.
	- Recommended Runtime: `lightx2v/lightx2v:26052801-cu130-5090`.

	### Dependencies

	- Prepare Wan2.2 T5 / VAE components following the standard LightX2V Wan2.2 model structure.
	- Use Blackwell + NVFP4 kernels for optimal speed and memory efficiency.

	### Performance Tips

	- Use the provided extreme inference script for the 4-step high-noise / low-noise expert schedule.
	- Sparse attention is most beneficial at higher resolutions where self-attention dominates latency.
	- Enable CPU offload only when GPU memory is limited, since offload can reduce throughput.

	## 🤝 Community

	- 🐛 Issues: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
	- 🤗 Models: [HuggingFace Hub](https://huggingface.co/lightx2v/)
	- 📖 Documentation: [LightX2V Docs](https://github.com/ModelTC/LightX2V)

	---

	<div align="center">

	If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)

	For questions or issues, please open an issue on [LightX2V](https://github.com/ModelTC/LightX2V/issues) or contact lvchengtao0319@gmail.com.

	</div>