nielsr HF Staff

Improve model card: Add detailed description, sample usage, and update paper link

91503b5 verified 4 months ago

4.7 kB

base_model: Wan-AI/Wan2.1-T2V-14B
license: apache-2.0
pipeline_tag: text-to-video
tags:
  - text-to-video
  - diffusion
  - video-generation
  - turbodiffusion
  - wan2.1

TurboWan2.1-T2V-14B-480P

This repository contains the TurboWan2.1-T2V-14B-480P model, part of the TurboDiffusion framework presented in TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times. TurboDiffusion is a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality.

For RTX 5090 or similar GPUs, please use the TurboWan2.1-T2V-14B-480P-quant. For other GPUs with a bigger GPU memory than 40GB, we recommend using TurboWan2.1-T2V-14B-480P.
For usage instructions and more details, please see the official GitHub repository: https://github.com/thu-ml/TurboDiffusion

Sample Usage

To run text-to-video inference using the TurboWan2.1-T2V-1.3B-480P-quant model, follow these steps. For full instructions, including downloading necessary VAE and text encoder checkpoints, refer to the GitHub repository.

export PYTHONPATH=turbodiffusion

# Example for Text-to-Video (T2V) inference
python turbodiffusion/inference/wan2.1_t2v_infer.py \
    --model Wan2.1-1.3B \
    --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth \
    --resolution 480p \
    --prompt "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about." \
    --num_samples 1 \
    --num_steps 4 \
    --quant_linear \
    --attention_type sagesla \
    --sla_topk 0.1

Citation

@article{zhang2025turbodiffusion,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={Zhang, Jintao and Zheng, Kaiwen and Jiang, Kai and Wang, Haoxu and Stoica, Ion and Gonzalez, Joseph E. and Chen, Jianfei and Zhu, Jun},
  journal={arXiv preprint arXiv:2512.16093},
  year={2025}
}

@software{turbodiffusion2025,
  title={TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times},
  author={The TurboDiffusion Team},
  url={https://github.com/thu-ml/TurboDiffusion},
  year={2025}
}

@inproceedings{zhang2025sageattention,
  title={SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration}, 
  author={Zhang, Jintao and Wei, Jia and Zhang, Pengle and Zhu, Jun and Chen, Jianfei},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

@article{zhang2025sla,
  title={SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention},
  author={Zhang, Jintao and Wang, Haoxu and Jiang, Kai and Yang, Shuo and Zheng, Kaiwen and Xi, Haocheng and Wang, Ziteng and Zhu, Hongzhou and Zhao, Min and Stoica, Ion and Gonzalez, Joseph E. and Zhu, Jun and Chen, Jianfei},
  journal={arXiv preprint arXiv:2509.24006},
  year={2025}
}

@article{zheng2025rcm,
  title={Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency},
  author={Zheng, Kaiwen and Wang, Yuji and Ma, Qianli and Chen, Huayu and Zhang, Jintao and Balaji, Yogesh and Chen, Jianfei and Liu, Ming-Yu and Zhu, Jun and Zhang, Qinsheng},
  journal={arXiv preprint arXiv:2510.08431},
  year={2025}
}

@inproceedings{zhang2024sageattention2,
  title={Sageattention2: Efficient attention with thorough outlier smoothing and per-thread int4 quantization},
  author={Zhang, Jintao and Huang, Haofeng and Zhang, Pengle and Wei, Jia and Zhu, Jun and Chen, Jianfei},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2025}
}

@article{zhang2025sageattention2++,
  title={Sageattention2++: A more efficient implementation of sageattention2},
  author={Zhang, Jintao and Xu, Xiaoming and Wei, Jia and Huang, Haofeng and Zhang, Pengle and Xiang, Chendong and Zhu, Jun and Chen, Jianfei},
  journal={arXiv preprint arXiv:2505.21136},
  year={2025}
}
@article{zhang2025sageattention3,
  title={SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training},
  author={Zhang, Jintao and Wei, Jia and Zhang, Pengle and Xu, Xiaoming and Huang, Haofeng and Wang, Haoxu and Jiang, Kai and Zhu, Jun and Chen, Jianfei},
  journal={arXiv preprint arXiv:2505.11594},
  year={2025}
}