MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Kewei Zhang1*,
Ye Huang1*,
Yufan Deng1,
Jincheng Yu2,
Junsong Chen2,
Huan Ling2,
Enze Xie2,
Daquan Zhou1
1Peking University β 2NVIDIA
MHLA is a universal high-efficiency linear attention operator. MHLA can be applied to image classification, image generation, language modeling, and video generation tasks, maintaining performance consistent with Flash Attention while achieving significant speed advantages over Flash Attention under long-sequence conditions. For more details, please refer to our paper.
This repository is organized into four sub-projects: mhla_dit, mhla_image_classification, mhla_nlp, and mhla_videogen. Each corresponds to the experimental code for the four tasks presented in our paper. Each sub-project contains its own README.md with detailed instructions.
Updates
[2026.01.12]π₯ Our paper is available at arxiv.[2026.01.12]π₯ We release the code of MHLA, including training and inference code for image classification, image generation, language modeling, and video generation.
Installation & Usage
Please refer to the README.md files in the following sub-projects for detailed information:
Performance & Efficiency
On Wan2.1-1.3B
| Method | Quality score | Semantic score | Total | Latency |
|---|---|---|---|---|
| Wan2.1 1.3B | 85.23 | 75.65 | 83.31 | 139s |
| Full MHLA | 83.93 | 78.40 | 82.83 | 62s |
| Full Linear | 69.96 | 11.38 | 58.24 | 62s |
| MHLA Hybrid 2/3 | 84.87 | 79.59 | 83.82 | 84s |
Wan-MHLA and Wan-LA replace all layers with MHLA and Linear Attention respectively. Wan-MHLA-H only replace 2/3 layers.
Acknowledgement
Our project is built on multiple inspiring projects including: timm, DiT, Sana and flash-linear-attention.
Support Us
If you find this work useful, please consider:
- Starring the repository
- Citing our paper
- Contributing to the codebase
Citation
@misc{mhla,
title={MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head},
author={Kewei Zhang and Ye Huang and Yufan Deng and Jincheng Yu and Junsong Chen and Huan Ling and Enze Xie and Daquan Zhou},
year={2026},
eprint={2601.07832},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.07832},
}
- Downloads last month
- -
Model tree for DAGroup-PKU/MHLA
Base model
Wan-AI/Wan2.1-T2V-1.3B