Instructions to use FastVideo/FastWan-QAD-1.3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FastVideo/FastWan-QAD-1.3B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FastVideo/FastWan-QAD-1.3B", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 3,280 Bytes
d53f8ab 38277dd d53f8ab bb036de 38277dd e98cf74 38277dd 91bc7cc 38277dd 621c6ae 38277dd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | ---
license: apache-2.0
pipeline_tag: text-to-video
library_name: diffusers
arxiv: 2603.00040
---
# FastWan-QAD-1.3B
<p align="center">
<img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logos/logo.svg" width="200"/>
</p>
<div>
<div align="center">
<a href="https://github.com/hao-ai-lab/FastVideo">Github</a> |
<a href="https://haoailab.com/blogs/fastwan-qad/">Blog</a> |
<a href="https://hao-ai-lab.github.io/FastVideo">Documentation</a>
</div>
</div>
## Introduction
FastWan-QAD-1.3B is the fastest variant of the FastWan-QAD series, targeting RTX 5090 users. It uses **NVFP4 quantized linear layers** paired with the **SageAttention3 FP4 attention backend**, achieving end-to-end generation of a 5-second 480p video in **1.78 seconds** — over 3.4× faster than prior distilled models on the same hardware.
The model is built on [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers) and trained with **quantization-aware distillation (QAD)**, jointly optimizing for low-bit precision and 3-step inference quality.
> **Hardware requirement:** RTX 5090 (sm100+). NVFP4 is a Blackwell-native format and is not supported on older GPUs. See [FastWan-QAD-1.3B-SA2](https://huggingface.co/FastVideo/FastWan-QAD-1.3B-SA2) for an alternative using SageAttention2++ or [FastWan-QAD-FP8-1.3B](https://huggingface.co/FastVideo/FastWan-QAD-FP8-1.3B) for RTX 4090 support.
---
## Model Overview
- **3-step inference** via quantization-aware distillation
- **NVFP4 linear layers** for maximum throughput on Blackwell GPUs
- **SageAttention3 FP4 backend** for attention computation
- Trained at **480p (832×480)** resolution, 81 frames (5 seconds at 16 fps)
- No classifier-free guidance at inference time
- Fast decoding via [TAEHV](https://github.com/madebyollin/taehv) tiny autoencoder
## Performance
| Model | Hardware | Generation Time (5s 480p) |
|---|---|---|
| FastWan-QAD-1.3B | RTX 5090 | **1.78s** |
| [FastWan-QAD-1.3B-SA2](https://huggingface.co/FastVideo/FastWan-QAD-1.3B-SA2) | RTX 5090 | ~2.0s |
| [FastWan-QAD-FP8-1.3B](https://huggingface.co/FastVideo/FastWan-QAD-FP8-1.3B) | RTX 4090 | ~3.4s |
| TurboDiffusion | RTX 5090 | 6.10s |
| LightX2V | RTX 5090 | 6.91s |
## Inference
```bash
docker run --gpus all --ipc=host --rm -it ghcr.io/hao-ai-lab/fastvideo/fastvideo-dev:py3.12-sha-f889e6b bash
# should drop you in /FastVideo with venv already activated
git fetch && git checkout main
# build fastvideo-kernel
cd fastvideo-kernels/ && ./build.sh && cd ..
git clone https://github.com/madebyollin/taehv
uv pip install ./taehv
# run generation:
FASTVIDEO_DISABLE_ATTENTION_COMPILE=0 FASTVIDEO_ATTENTION_BACKEND=ATTN_QAT_INFER python examples/inference/optimizations/FastWan_QAD_TAEHV.py --model FastVideo/FastWan-QAD-1.3B --distilled_model "" --taehv_checkpoint taehv/taew2_1.pth
```
## Training
More details coming soon.
---
It would be greatly appreciated if you cite our paper:
```
@article{Zhang2026AttnQAT,
title={Attn-QAT: 4-Bit Attention With Quantization-Aware Training},
author={Zhang, Peiyuan and Noto, Matthew and Tan, Wenxuan and Jiang, Chengquan and Lin, Will and Zhou, Wei and Zhang, Hao},
journal={arXiv preprint arXiv:2603.00040},
year={2026}
}
```
|