The MHA2MLA-VLM model published in the paper "MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models"
Xiaoran Fan
cnxup
AI & ML interests
NLP, CV, LLM
Recent Activity
updated
a model
about 8 hours ago
cnxup/Qwen2.5-VL-7B-MLA-stage2-rope32-d_kv_128
updated
a model
about 8 hours ago
cnxup/Qwen2.5-VL-7B-MLA-stage2-rope32-d_kv_64
updated
a model
about 8 hours ago
cnxup/Qwen2.5-VL-7B-MLA-stage2-rope32-d_kv_32
Organizations
None yet