Automatic Speech Recognition
speech-llm
conversational-asr
MLC-SLM / README.md
YuCeong-May's picture
Improve model card and add paper/GitHub links (#1)
919705c verified
---
base_model:
- Qwen/Qwen2.5-7B
- openai/whisper-large-v3
- utter-project/mHuBERT-147
datasets:
- Nexdata/INTERSPEECH_2025_MLC-SLM_Challenge_Dataset
- bsmu/MLC-SLM-Eval
language:
- en
- fr
- it
- ja
- ko
- vi
- th
- pt
- ru
- es
- de
license: apache-2.0
metrics:
- cer
- wer
pipeline_tag: automatic-speech-recognition
tags:
- speech-llm
- conversational-asr
---
# MLC-SLM: Bridging the Gap in Multilingual Conversational ASR
This repository contains the models and code presented in the paper [Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR](https://huggingface.co/papers/2601.01461).
The project was developed for the INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM).
- **Paper:** [arXiv:2601.01461](https://huggingface.co/papers/2601.01461)
- **Code:** [GitHub - MLC-SLM](https://github.com/1535176727/MLC-SLM)
## Description
The proposed **Speech-LLM** is an enhanced framework that integrates fine-tuned Whisper and mHuBERT encoders with a Large Language Model (Qwen2.5-7B) to enrich speech representations for multilingual conversational ASR. It utilizes cross-attention-based fusion mechanisms to exploit complementary information between generative (Whisper) and discriminative (mHuBERT) speech features.
## Results
Performance (CER/WER) on the MLC-SLM Challenge datasets:
| **System** | **Dev** | **Eval** | **CV-Test** |
|----------------------------|---------|----------|-------------|
| Whisper (LoRA-fine-tuned) | 11.40 | 10.71 | **11.47** |
| Whisper (Full-fine-tuned) | **10.99** | **10.07** | 13.11 |
| **Proposed Speech-LLM** | 11.74 | 10.69| 15.26 |
## Dataset
The models were trained on the official ~1500h training set from the MLC-SLM Challenge, covering 11 languages and 15 categories (including various English accents).
## Citation
```bibtex
@article{mlcslm2025bridging,
title={Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR},
author={Anonymous Authors},
journal={arXiv preprint arXiv:2601.01461},
year={2025}
}
```