| --- |
| license: mit |
| --- |
| |
| # Zipper-LoRA |
|
|
| <p> |
| <a href="https://arxiv.org/abs/2603.17558"><img src="https://img.shields.io/badge/arXiv-2603.17558-b31c1b.svg" alt="arXiv"></a> |
| <a href="https://github.com/YuCeong-May/Zipper-LoRA"><img src="https://img.shields.io/badge/GitHub-Zipper--LoRA-green.svg" alt="GitHub"></a> |
| <a href="https://huggingface.co/YuCeong-May/Zipper-LoRA"><img src="https://img.shields.io/badge/π€-Zipper--LoRA-yellowbadge.svg" alt="HuggingFace"></a> |
| <img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License"> |
| </p> |
|
|
| ## π Abstract |
|
|
| Speech Large Language Models (Speech-LLMs) have emerged as a powerful approach for automatic speech recognition (ASR) by aligning speech encoders with large language models. However, adapting these systems to multilingual settings with imbalanced data distributions remains challenging. |
|
|
| In such scenarios, a **stability-plasticity dilemma** often arises: |
| - Fully shared Parameter-Efficient Fine-Tuning (PEFT) can cause **negative inter-lingual interference** for under-represented languages |
| - Fully language-specific tuning limits the **cross-lingual beneficial knowledge transfer** needed for low-resource tasks |
|
|
| To address this, we propose **Zipper-LoRA**, a novel rank-level decoupling framework with three variants (Static, Hard, and Soft) that dynamically synthesizes LoRA updates from shared and language-specific subspaces. |
|
|
| ### Key Features |
|
|
| - **Language-Conditioned Router**: Dynamically controls the contribution of each subspace at the LoRA rank level |
| - **Fine-grained Sharing**: Enables sharing where languages are compatible, strict decoupling when conflicts occur |
| - **Two-Stage Training**: With Initial-B warm start for accelerated convergence |
| - **Robust Performance**: Works across both chunked and non-chunked encoder configurations |
|
|
| ### Results |
|
|
| Experiments on a 12-language mixed-resource setting show that Zipper-LoRA consistently outperforms both fully shared and independent baselines, particularly in **extremely low-resource scenarios**. |
|
|
| --- |
|
|
| ## π TODO |
|
|
| - [x] Paper |
| - [x] Data |
| - [ ] Code (will be released after paper accepted) |
| - [x] Model Weights (coming soon) |
|
|
|
|
| ## π Citation## π Citation |
|
|
| If you find this work helpful, please cite: |
|
|
| ```bibtex |
| @article{ZipperLoRA2026, |
| title={Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition}, |
| author={Mei, Yuxiang and Qiu, Delai and Liu, Shengping and Liang, Jiaen and Long, Yanhua}, |
| journal={arXiv preprint arXiv:2603.17558}, |
| year={2026} |
| } |
| ``` |
|
|
| ------ |
|
|
| <p align="center"> |
| <img src="https://img.shields.io/badge/Status-Under_Construction-yellow" alt="Status"> |
| </p> |