MLC-SLM / README.md

Improve model card and add paper/GitHub links (#1)

919705c verified 9 days ago

2.2 kB

	---
	base_model:
	- Qwen/Qwen2.5-7B
	- openai/whisper-large-v3
	- utter-project/mHuBERT-147
	datasets:
	- Nexdata/INTERSPEECH_2025_MLC-SLM_Challenge_Dataset
	- bsmu/MLC-SLM-Eval
	language:
	- en
	- fr
	- it
	- ja
	- ko
	- vi
	- th
	- pt
	- ru
	- es
	- de
	license: apache-2.0
	metrics:
	- cer
	- wer
	pipeline_tag: automatic-speech-recognition
	tags:
	- speech-llm
	- conversational-asr
	---

	# MLC-SLM: Bridging the Gap in Multilingual Conversational ASR

	This repository contains the models and code presented in the paper [Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR](https://huggingface.co/papers/2601.01461).

	The project was developed for the INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM).

	- Paper: [arXiv:2601.01461](https://huggingface.co/papers/2601.01461)
	- Code: [GitHub - MLC-SLM](https://github.com/1535176727/MLC-SLM)

	## Description

	The proposed Speech-LLM is an enhanced framework that integrates fine-tuned Whisper and mHuBERT encoders with a Large Language Model (Qwen2.5-7B) to enrich speech representations for multilingual conversational ASR. It utilizes cross-attention-based fusion mechanisms to exploit complementary information between generative (Whisper) and discriminative (mHuBERT) speech features.

	## Results

	Performance (CER/WER) on the MLC-SLM Challenge datasets:

	\| System \| Dev \| Eval \| CV-Test \|
	\|----------------------------\|---------\|----------\|-------------\|
	\| Whisper (LoRA-fine-tuned) \| 11.40 \| 10.71 \| 11.47 \|
	\| Whisper (Full-fine-tuned) \| 10.99 \| 10.07 \| 13.11 \|
	\| Proposed Speech-LLM \| 11.74 \| 10.69\| 15.26 \|

	## Dataset

	The models were trained on the official ~1500h training set from the MLC-SLM Challenge, covering 11 languages and 15 categories (including various English accents).

	## Citation

	```bibtex
	@article{mlcslm2025bridging,
	title={Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR},
	author={Anonymous Authors},
	journal={arXiv preprint arXiv:2601.01461},
	year={2025}
	}
	```