YuCeong-May
/

MLC-SLM

@@ -1,5 +1,8 @@
 ---
-license: apache-2.0
 datasets:
 - Nexdata/INTERSPEECH_2025_MLC-SLM_Challenge_Dataset
 - bsmu/MLC-SLM-Eval
@@ -15,21 +18,50 @@ language:
 - ru
 - es
 - de
 metrics:
 - cer
 - wer
-base_model:
-- Qwen/Qwen2.5-7B
-- openai/whisper-large-v3
-- utter-project/mHuBERT-147
 pipeline_tag: automatic-speech-recognition
 ---
-The fine-tuned Whisper models and Speech-LLM we proposed.
 | **System**                | **Dev** | **Eval** | **CV-Test** |
 |----------------------------|---------|----------|-------------|
 | Whisper (LoRA-fine-tuned)  | 11.40   | 10.71    | **11.47**   |
 | Whisper (Full-fine-tuned)  | **10.99**   | **10.07**    | 13.11       |
-| **Proposed Speech-LLM**    | 11.74   | 10.69| 15.26       |

 ---
+base_model:
+- Qwen/Qwen2.5-7B
+- openai/whisper-large-v3
+- utter-project/mHuBERT-147
 datasets:
 - Nexdata/INTERSPEECH_2025_MLC-SLM_Challenge_Dataset
 - bsmu/MLC-SLM-Eval
 - ru
 - es
 - de
+license: apache-2.0
 metrics:
 - cer
 - wer
 pipeline_tag: automatic-speech-recognition
+tags:
+- speech-llm
+- conversational-asr
 ---
+# MLC-SLM: Bridging the Gap in Multilingual Conversational ASR
+This repository contains the models and code presented in the paper [Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR](https://huggingface.co/papers/2601.01461).
+The project was developed for the INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM).
+- **Paper:** [arXiv:2601.01461](https://huggingface.co/papers/2601.01461)
+- **Code:** [GitHub - MLC-SLM](https://github.com/1535176727/MLC-SLM)
+## Description
+The proposed **Speech-LLM** is an enhanced framework that integrates fine-tuned Whisper and mHuBERT encoders with a Large Language Model (Qwen2.5-7B) to enrich speech representations for multilingual conversational ASR. It utilizes cross-attention-based fusion mechanisms to exploit complementary information between generative (Whisper) and discriminative (mHuBERT) speech features.
+## Results
+Performance (CER/WER) on the MLC-SLM Challenge datasets:
 | **System**                | **Dev** | **Eval** | **CV-Test** |
 |----------------------------|---------|----------|-------------|
 | Whisper (LoRA-fine-tuned)  | 11.40   | 10.71    | **11.47**   |
 | Whisper (Full-fine-tuned)  | **10.99**   | **10.07**    | 13.11       |
+| **Proposed Speech-LLM**    | 11.74   | 10.69| 15.26       |
+## Dataset
+The models were trained on the official ~1500h training set from the MLC-SLM Challenge, covering 11 languages and 15 categories (including various English accents).
+## Citation
+```bibtex
+@article{mlcslm2025bridging,
+  title={Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR},
+  author={Anonymous Authors},
+  journal={arXiv preprint arXiv:2601.01461},
+  year={2025}
+}
+```