File size: 5,547 Bytes

---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-T2V-A14B
library_name: diffusers
tags:
- video_generation
- NVFP4
- Sparse_Attention
- Wan
---
# 🎬 Wan2.2-NVFP4-Sparse

> **An extremely efficient Wan 2.2 14B variant: NVFP4 Quantization-Aware Step Distillation with Sparse Attention for Blackwell Architecture**

[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)

## 📋 Table of Contents

- [✨ Features](#-features)
- [🚀 Quick Start](#-quick-start)
- [🎬 Generation Results](#-generation-results)
- [⚡ Performance Comparison](#-performance-comparison)
- [⚠️ Notes](#️-notes)
- [🤝 Community](#-community)

## ✨ Features

- **⚡ 4-Step Inference**: Two high-noise expert steps followed by two low-noise expert steps, enabling extremely fast Wan2.2 MoE generation on a single Blackwell GPU.
- **🎯 NVFP4 Quantization**: Quantization-aware step distillation reduces memory traffic and compute cost while targeting Blackwell architecture.
- **🧩 Sparse Attention**: Accelerates the costly O(n²) self-attention workload with sparse attention, reducing end-to-end latency for high-resolution video generation.
- **🔧 LightX2V Integration**: Recommended runtime stack for stable deployment and best performance.
- **🚀 High-Quality Generation**: Preserves the visual quality of Wan2.2-T2V-14B while dramatically improving inference speed.

## 🚀 Quick Start

We strongly recommend using the official LightX2V Docker image for the cleanest environment and best reproducibility.

### Option A: Docker Recommended

```bash
# 1. Pull LightX2V Docker image
docker pull lightx2v/lightx2v:26052801-cu130-5090

# 2. Run inference
bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
```

### Option B: Manual Installation

If Docker is not available, install the environment manually:

```bash
# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .

# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel

MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
  -Cbuild-dir=build . \
  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
  --verbose --color=always --no-build-isolation

pip install dist/*whl --force-reinstall --no-deps

# 3. Run inference
bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
```

Script: [run_wan22_moe_t2v_extreme.sh](https://github.com/ModelTC/LightX2V/blob/main/scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh)

## 🎬 Generation Results

<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
"Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"
</p>
</div>


| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse |
| --- | --- | --- |
| 480p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/WTHhrzx7XR4S1Ys_6Kzx4.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/zorpw7gm9At0J2kCmvkDr.mp4"></video> |
| 720p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/vkiyKj7CJA-r0yTz7TEum.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/TuECbzvW5jI9NHG6GLvIR.mp4"></video> |


## ⚡ Performance Comparison

**Test Environment**: RTX 5090 Single GPU | LightX2V Framework | End-to-End Latency

| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse | Speedup |
| --- | ---: | ---: | ---: |
| 480p | 734s | 14.15s | 51.9x |
| 720p | 2668s | 45s | 59.3x |

## ⚠️ Notes

### System Requirements

- **Required Hardware**: NVIDIA RTX 50-series GPUs or other Blackwell architecture GPUs.
- **Recommended Runtime**: `lightx2v/lightx2v:26052801-cu130-5090`.

### Dependencies

- Prepare Wan2.2 T5 / VAE components following the standard LightX2V Wan2.2 model structure.
- Use Blackwell + NVFP4 kernels for optimal speed and memory efficiency.

### Performance Tips

- Use the provided extreme inference script for the 4-step high-noise / low-noise expert schedule.
- Sparse attention is most beneficial at higher resolutions where self-attention dominates latency.
- Enable CPU offload only when GPU memory is limited, since offload can reduce throughput.

## 🤝 Community

- **🐛 Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
- **🤗 Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
- **📖 Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)

---

<div align="center">

**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**

For questions or issues, please open an issue on [LightX2V](https://github.com/ModelTC/LightX2V/issues) or contact lvchengtao0319@gmail.com.

</div>