Upload folder using huggingface_hub

Files changed (4) hide show

Qwen2.5-VL-7B-rope32-d_kv_128.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9148ff63a705b03a6f0141cfa68c2c3e15baab92c75e4505e5136ffb94f5bcb
+size 513855082

Qwen2.5-VL-7B-rope32-d_kv_32.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6c47ebb75a0e585efc6bff9a14502769eb784a858226e45f1ba3cc9e0c64417
+size 128502782

Qwen2.5-VL-7B-rope32-d_kv_64.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:41e1e81d12ba9f917fc01f3e571a855cf31769bf437b1ab4d0ddb7c4945e40a5
+size 256953790

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
----
-license: apache-2.0
----

+**Research Paper**: ["MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"](https://arxiv.org/abs/2601.11464)
+## Description
+This repository contains **our proposed MD-SVD (Modality-Decoupled Singular Value Decomposition) initialization weights** extracted from Stage 1 checkpoints for initializing Stage 2 MHA2MLA-VLM models, which independently compresses visual and textual KV spaces, enabling efficient compression while maintaining model performance.
+## Available Weight Files
+| File Name | Latent Dimension (d_kv) |
+|-----------|------------------------|
+| `Qwen2.5-VL-7B-rope32-d_kv_32.pt` | 32 |
+| `Qwen2.5-VL-7B-rope32-d_kv_64.pt` | 64 |
+| `Qwen2.5-VL-7B-rope32-d_kv_128.pt` | 128 |
+## Citation
+```bibtex
+@misc{fan2026mha2mlavlmenablingdeepseekseconomical,
+      title={MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models},
+      author={Xiaoran Fan and Zhichao Sun and Tao Ji and Lixing Shen and Tao Gui},
+      year={2026},
+      eprint={2601.11464},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2601.11464},
+}
+```