MLC-SLM: Bridging the Gap in Multilingual Conversational ASR

This repository contains the models and code presented in the paper Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR.

The project was developed for the INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM).

Paper: arXiv:2601.01461
Code: GitHub - MLC-SLM

Description

The proposed Speech-LLM is an enhanced framework that integrates fine-tuned Whisper and mHuBERT encoders with a Large Language Model (Qwen2.5-7B) to enrich speech representations for multilingual conversational ASR. It utilizes cross-attention-based fusion mechanisms to exploit complementary information between generative (Whisper) and discriminative (mHuBERT) speech features.

Results

Performance (CER/WER) on the MLC-SLM Challenge datasets:

System	Dev	Eval	CV-Test
Whisper (LoRA-fine-tuned)	11.40	10.71	11.47
Whisper (Full-fine-tuned)	10.99	10.07	13.11
Proposed Speech-LLM	11.74	10.69	15.26

Dataset

The models were trained on the official ~1500h training set from the MLC-SLM Challenge, covering 11 languages and 15 categories (including various English accents).

Citation

@article{mlcslm2025bridging,
  title={Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR},
  author={Yuxiang Mei, Dongxing Xu, Jiaen Liang and Yanhua Long},
  journal={arXiv preprint arXiv:2601.01461},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for YuCeong-May/MLC-SLM

Base model

Qwen/Qwen2.5-7B

Finetuned

(945)

this model

Datasets used to train YuCeong-May/MLC-SLM

Paper for YuCeong-May/MLC-SLM

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR

Paper • 2601.01461 • Published Jan 4