--- license: apache-2.0 pipeline_tag: text-to-video library_name: diffusers arxiv: 2603.00040 --- # FastWan-QAD-1.3B

Github | Blog | Documentation
## Introduction FastWan-QAD-1.3B is the fastest variant of the FastWan-QAD series, targeting RTX 5090 users. It uses **NVFP4 quantized linear layers** paired with the **SageAttention3 FP4 attention backend**, achieving end-to-end generation of a 5-second 480p video in **1.78 seconds** — over 3.4× faster than prior distilled models on the same hardware. The model is built on [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers) and trained with **quantization-aware distillation (QAD)**, jointly optimizing for low-bit precision and 3-step inference quality. > **Hardware requirement:** RTX 5090 (sm100+). NVFP4 is a Blackwell-native format and is not supported on older GPUs. See [FastWan-QAD-1.3B-SA2](https://huggingface.co/FastVideo/FastWan-QAD-1.3B-SA2) for an alternative using SageAttention2++ or [FastWan-QAD-FP8-1.3B](https://huggingface.co/FastVideo/FastWan-QAD-FP8-1.3B) for RTX 4090 support. --- ## Model Overview - **3-step inference** via quantization-aware distillation - **NVFP4 linear layers** for maximum throughput on Blackwell GPUs - **SageAttention3 FP4 backend** for attention computation - Trained at **480p (832×480)** resolution, 81 frames (5 seconds at 16 fps) - No classifier-free guidance at inference time - Fast decoding via [TAEHV](https://github.com/madebyollin/taehv) tiny autoencoder ## Performance | Model | Hardware | Generation Time (5s 480p) | |---|---|---| | FastWan-QAD-1.3B | RTX 5090 | **1.78s** | | [FastWan-QAD-1.3B-SA2](https://huggingface.co/FastVideo/FastWan-QAD-1.3B-SA2) | RTX 5090 | ~2.0s | | [FastWan-QAD-FP8-1.3B](https://huggingface.co/FastVideo/FastWan-QAD-FP8-1.3B) | RTX 4090 | ~3.4s | | TurboDiffusion | RTX 5090 | 6.10s | | LightX2V | RTX 5090 | 6.91s | ## Inference ```bash docker run --gpus all --ipc=host --rm -it ghcr.io/hao-ai-lab/fastvideo/fastvideo-dev:py3.12-sha-f889e6b bash # should drop you in /FastVideo with venv already activated git fetch && git checkout main # build fastvideo-kernel cd fastvideo-kernels/ && ./build.sh && cd .. git clone https://github.com/madebyollin/taehv uv pip install ./taehv # run generation: FASTVIDEO_DISABLE_ATTENTION_COMPILE=0 FASTVIDEO_ATTENTION_BACKEND=ATTN_QAT_INFER python examples/inference/optimizations/FastWan_QAD_TAEHV.py --model FastVideo/FastWan-QAD-1.3B --distilled_model "" --taehv_checkpoint taehv/taew2_1.pth ``` ## Training More details coming soon. --- It would be greatly appreciated if you cite our paper: ``` @article{Zhang2026AttnQAT, title={Attn-QAT: 4-Bit Attention With Quantization-Aware Training}, author={Zhang, Peiyuan and Noto, Matthew and Tan, Wenxuan and Jiang, Chengquan and Lin, Will and Zhou, Wei and Zhang, Hao}, journal={arXiv preprint arXiv:2603.00040}, year={2026} } ```