Instructions to use FastVideo/FastWan-QAD-1.3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use FastVideo/FastWan-QAD-1.3B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("FastVideo/FastWan-QAD-1.3B", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| pipeline_tag: text-to-video | |
| library_name: diffusers | |
| arxiv: 2603.00040 | |
| # FastWan-QAD-1.3B | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logos/logo.svg" width="200"/> | |
| </p> | |
| <div> | |
| <div align="center"> | |
| <a href="https://github.com/hao-ai-lab/FastVideo">Github</a> | | |
| <a href="https://haoailab.com/blogs/fastwan-qad/">Blog</a> | | |
| <a href="https://hao-ai-lab.github.io/FastVideo">Documentation</a> | |
| </div> | |
| </div> | |
| ## Introduction | |
| FastWan-QAD-1.3B is the fastest variant of the FastWan-QAD series, targeting RTX 5090 users. It uses **NVFP4 quantized linear layers** paired with the **SageAttention3 FP4 attention backend**, achieving end-to-end generation of a 5-second 480p video in **1.78 seconds** — over 3.4× faster than prior distilled models on the same hardware. | |
| The model is built on [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers) and trained with **quantization-aware distillation (QAD)**, jointly optimizing for low-bit precision and 3-step inference quality. | |
| > **Hardware requirement:** RTX 5090 (sm100+). NVFP4 is a Blackwell-native format and is not supported on older GPUs. See [FastWan-QAD-1.3B-SA2](https://huggingface.co/FastVideo/FastWan-QAD-1.3B-SA2) for an alternative using SageAttention2++ or [FastWan-QAD-FP8-1.3B](https://huggingface.co/FastVideo/FastWan-QAD-FP8-1.3B) for RTX 4090 support. | |
| --- | |
| ## Model Overview | |
| - **3-step inference** via quantization-aware distillation | |
| - **NVFP4 linear layers** for maximum throughput on Blackwell GPUs | |
| - **SageAttention3 FP4 backend** for attention computation | |
| - Trained at **480p (832×480)** resolution, 81 frames (5 seconds at 16 fps) | |
| - No classifier-free guidance at inference time | |
| - Fast decoding via [TAEHV](https://github.com/madebyollin/taehv) tiny autoencoder | |
| ## Performance | |
| | Model | Hardware | Generation Time (5s 480p) | | |
| |---|---|---| | |
| | FastWan-QAD-1.3B | RTX 5090 | **1.78s** | | |
| | [FastWan-QAD-1.3B-SA2](https://huggingface.co/FastVideo/FastWan-QAD-1.3B-SA2) | RTX 5090 | ~2.0s | | |
| | [FastWan-QAD-FP8-1.3B](https://huggingface.co/FastVideo/FastWan-QAD-FP8-1.3B) | RTX 4090 | ~3.4s | | |
| | TurboDiffusion | RTX 5090 | 6.10s | | |
| | LightX2V | RTX 5090 | 6.91s | | |
| ## Inference | |
| ```bash | |
| docker run --gpus all --ipc=host --rm -it ghcr.io/hao-ai-lab/fastvideo/fastvideo-dev:py3.12-sha-f889e6b bash | |
| # should drop you in /FastVideo with venv already activated | |
| git fetch && git checkout main | |
| # build fastvideo-kernel | |
| cd fastvideo-kernels/ && ./build.sh && cd .. | |
| git clone https://github.com/madebyollin/taehv | |
| uv pip install ./taehv | |
| # run generation: | |
| FASTVIDEO_DISABLE_ATTENTION_COMPILE=0 FASTVIDEO_ATTENTION_BACKEND=ATTN_QAT_INFER python examples/inference/optimizations/FastWan_QAD_TAEHV.py --model FastVideo/FastWan-QAD-1.3B --distilled_model "" --taehv_checkpoint taehv/taew2_1.pth | |
| ``` | |
| ## Training | |
| More details coming soon. | |
| --- | |
| It would be greatly appreciated if you cite our paper: | |
| ``` | |
| @article{Zhang2026AttnQAT, | |
| title={Attn-QAT: 4-Bit Attention With Quantization-Aware Training}, | |
| author={Zhang, Peiyuan and Noto, Matthew and Tan, Wenxuan and Jiang, Chengquan and Lin, Will and Zhou, Wei and Zhang, Hao}, | |
| journal={arXiv preprint arXiv:2603.00040}, | |
| year={2026} | |
| } | |
| ``` | |