| **Research Paper**: ["MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"](https://arxiv.org/abs/2601.11464) | |
| ## Description | |
| This repository contains **our proposed MD-SVD (Modality-Decoupled Singular Value Decomposition) initialization weights** extracted from Stage 1 checkpoints for initializing Stage 2 MHA2MLA-VLM models, which independently compresses visual and textual KV spaces, enabling efficient compression while maintaining model performance. | |
| ## Available Weight Files | |
| | File Name | Latent Dimension (d_kv) | | |
| |-----------|------------------------| | |
| | `Qwen2.5-VL-7B-rope32-d_kv_32.pt` | 32 | | |
| | `Qwen2.5-VL-7B-rope32-d_kv_64.pt` | 64 | | |
| | `Qwen2.5-VL-7B-rope32-d_kv_128.pt` | 128 | | |
| ## Citation | |
| ```bibtex | |
| @misc{fan2026mha2mlavlmenablingdeepseekseconomical, | |
| title={MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models}, | |
| author={Xiaoran Fan and Zhichao Sun and Tao Ji and Lixing Shen and Tao Gui}, | |
| year={2026}, | |
| eprint={2601.11464}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2601.11464}, | |
| } | |
| ``` |