File size: 5,547 Bytes
43f194a
 
4c72cb2
 
 
 
 
 
 
 
e6d8590
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6f09bc
e6d8590
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6f09bc
e6d8590
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-T2V-A14B
library_name: diffusers
tags:
- video_generation
- NVFP4
- Sparse_Attention
- Wan
---
# 🎬 Wan2.2-NVFP4-Sparse

> **An extremely efficient Wan 2.2 14B variant: NVFP4 Quantization-Aware Step Distillation with Sparse Attention for Blackwell Architecture**

[![GitHub](https://img.shields.io/badge/GitHub-ModelTC/LightX2V-blue)](https://github.com/ModelTC/LightX2V)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-lightx2v-yellow)](https://huggingface.co/lightx2v/)

## πŸ“‹ Table of Contents

- [✨ Features](#-features)
- [πŸš€ Quick Start](#-quick-start)
- [🎬 Generation Results](#-generation-results)
- [⚑ Performance Comparison](#-performance-comparison)
- [⚠️ Notes](#️-notes)
- [🀝 Community](#-community)

## ✨ Features

- **⚑ 4-Step Inference**: Two high-noise expert steps followed by two low-noise expert steps, enabling extremely fast Wan2.2 MoE generation on a single Blackwell GPU.
- **🎯 NVFP4 Quantization**: Quantization-aware step distillation reduces memory traffic and compute cost while targeting Blackwell architecture.
- **🧩 Sparse Attention**: Accelerates the costly O(n²) self-attention workload with sparse attention, reducing end-to-end latency for high-resolution video generation.
- **πŸ”§ LightX2V Integration**: Recommended runtime stack for stable deployment and best performance.
- **πŸš€ High-Quality Generation**: Preserves the visual quality of Wan2.2-T2V-14B while dramatically improving inference speed.

## πŸš€ Quick Start

We strongly recommend using the official LightX2V Docker image for the cleanest environment and best reproducibility.

### Option A: Docker Recommended

```bash
# 1. Pull LightX2V Docker image
docker pull lightx2v/lightx2v:26052801-cu130-5090

# 2. Run inference
bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
```

### Option B: Manual Installation

If Docker is not available, install the environment manually:

```bash
# 1. Install LightX2V
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
uv pip install -v .

# 2. Install NVFP4 Kernel
pip install scikit_build_core uv
git clone https://github.com/NVIDIA/cutlass.git
cd lightx2v_kernel

MAX_JOBS=$(nproc) CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
  -Cbuild-dir=build . \
  -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
  --verbose --color=always --no-build-isolation

pip install dist/*whl --force-reinstall --no-deps

# 3. Run inference
bash scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh
```

Script: [run_wan22_moe_t2v_extreme.sh](https://github.com/ModelTC/LightX2V/blob/main/scripts/wan22/distill/run_wan22_moe_t2v_extreme.sh)

## 🎬 Generation Results

<div style="background: #f8fafc; border: 1px solid #e2e8f0; border-radius: 8px; padding: 16px; margin: 16px 0;">
<p style="font-style: italic; color: #475569; margin: 0; padding: 12px; background: white; border-radius: 6px; border-left: 4px solid #3b82f6;">
"Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage"
</p>
</div>


| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse |
| --- | --- | --- |
| 480p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/WTHhrzx7XR4S1Ys_6Kzx4.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/zorpw7gm9At0J2kCmvkDr.mp4"></video> |
| 720p | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/vkiyKj7CJA-r0yTz7TEum.mp4"></video> | <video controls style="width: 260px; height: 180px; border-radius: 6px; object-fit: cover;" src="https://cdn-uploads.huggingface.co/production/uploads/658e760cccbc1e2cc78b4258/TuECbzvW5jI9NHG6GLvIR.mp4"></video> |


## ⚑ Performance Comparison

**Test Environment**: RTX 5090 Single GPU | LightX2V Framework | End-to-End Latency

| Resolution | Wan2.2-T2V-14B | Wan2.2-NVFP4-Sparse | Speedup |
| --- | ---: | ---: | ---: |
| 480p | 734s | 14.15s | 51.9x |
| 720p | 2668s | 45s | 59.3x |

## ⚠️ Notes

### System Requirements

- **Required Hardware**: NVIDIA RTX 50-series GPUs or other Blackwell architecture GPUs.
- **Recommended Runtime**: `lightx2v/lightx2v:26052801-cu130-5090`.

### Dependencies

- Prepare Wan2.2 T5 / VAE components following the standard LightX2V Wan2.2 model structure.
- Use Blackwell + NVFP4 kernels for optimal speed and memory efficiency.

### Performance Tips

- Use the provided extreme inference script for the 4-step high-noise / low-noise expert schedule.
- Sparse attention is most beneficial at higher resolutions where self-attention dominates latency.
- Enable CPU offload only when GPU memory is limited, since offload can reduce throughput.

## 🀝 Community

- **πŸ› Issues**: [GitHub Issues](https://github.com/ModelTC/LightX2V/issues)
- **πŸ€— Models**: [HuggingFace Hub](https://huggingface.co/lightx2v/)
- **πŸ“– Documentation**: [LightX2V Docs](https://github.com/ModelTC/LightX2V)

---

<div align="center">

**If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)**

For questions or issues, please open an issue on [LightX2V](https://github.com/ModelTC/LightX2V/issues) or contact lvchengtao0319@gmail.com.

</div>