Safetensors

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

       

Kewei Zhang1*, Ye Huang1*, Yufan Deng1, Jincheng Yu2, Junsong Chen2,
Huan Ling2, Enze Xie2, Daquan Zhou1

1Peking University   2NVIDIA

MHLA is a universal high-efficiency linear attention operator. MHLA can be applied to image classification, image generation, language modeling, and video generation tasks, maintaining performance consistent with Flash Attention while achieving significant speed advantages over Flash Attention under long-sequence conditions. For more details, please refer to our paper.

This repository is organized into four sub-projects: mhla_dit, mhla_image_classification, mhla_nlp, and mhla_videogen. Each corresponds to the experimental code for the four tasks presented in our paper. Each sub-project contains its own README.md with detailed instructions.

Updates

  • [2026.01.12] πŸ”₯ Our paper is available at arxiv.
  • [2026.01.12] πŸ”₯ We release the code of MHLA, including training and inference code for image classification, image generation, language modeling, and video generation.

Installation & Usage

Please refer to the README.md files in the following sub-projects for detailed information:

Performance & Efficiency

On Wan2.1-1.3B

Method Quality score Semantic score Total Latency
Wan2.1 1.3B 85.23 75.65 83.31 139s
Full MHLA 83.93 78.40 82.83 62s
Full Linear 69.96 11.38 58.24 62s
MHLA Hybrid 2/3 84.87 79.59 83.82 84s

Wan-MHLA and Wan-LA replace all layers with MHLA and Linear Attention respectively. Wan-MHLA-H only replace 2/3 layers.

Acknowledgement

Our project is built on multiple inspiring projects including: timm, DiT, Sana and flash-linear-attention.

Support Us

If you find this work useful, please consider:

  • Starring the repository
  • Citing our paper
  • Contributing to the codebase

Citation

@misc{mhla,
      title={MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head}, 
      author={Kewei Zhang and Ye Huang and Yufan Deng and Jincheng Yu and Junsong Chen and Huan Ling and Enze Xie and Daquan Zhou},
      year={2026},
      eprint={2601.07832},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.07832}, 
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DAGroup-PKU/MHLA

Finetuned
(26)
this model

Paper for DAGroup-PKU/MHLA