Automatic Speech Recognition
speech-llm
conversational-asr

Improve model card and add paper/GitHub links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +39 -7
README.md CHANGED
@@ -1,5 +1,8 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  datasets:
4
  - Nexdata/INTERSPEECH_2025_MLC-SLM_Challenge_Dataset
5
  - bsmu/MLC-SLM-Eval
@@ -15,21 +18,50 @@ language:
15
  - ru
16
  - es
17
  - de
 
18
  metrics:
19
  - cer
20
  - wer
21
- base_model:
22
- - Qwen/Qwen2.5-7B
23
- - openai/whisper-large-v3
24
- - utter-project/mHuBERT-147
25
  pipeline_tag: automatic-speech-recognition
 
 
 
26
  ---
27
 
 
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- The fine-tuned Whisper models and Speech-LLM we proposed.
31
  | **System** | **Dev** | **Eval** | **CV-Test** |
32
  |----------------------------|---------|----------|-------------|
33
  | Whisper (LoRA-fine-tuned) | 11.40 | 10.71 | **11.47** |
34
  | Whisper (Full-fine-tuned) | **10.99** | **10.07** | 13.11 |
35
- | **Proposed Speech-LLM** | 11.74 | 10.69| 15.26 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B
4
+ - openai/whisper-large-v3
5
+ - utter-project/mHuBERT-147
6
  datasets:
7
  - Nexdata/INTERSPEECH_2025_MLC-SLM_Challenge_Dataset
8
  - bsmu/MLC-SLM-Eval
 
18
  - ru
19
  - es
20
  - de
21
+ license: apache-2.0
22
  metrics:
23
  - cer
24
  - wer
 
 
 
 
25
  pipeline_tag: automatic-speech-recognition
26
+ tags:
27
+ - speech-llm
28
+ - conversational-asr
29
  ---
30
 
31
+ # MLC-SLM: Bridging the Gap in Multilingual Conversational ASR
32
 
33
+ This repository contains the models and code presented in the paper [Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR](https://huggingface.co/papers/2601.01461).
34
+
35
+ The project was developed for the INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM).
36
+
37
+ - **Paper:** [arXiv:2601.01461](https://huggingface.co/papers/2601.01461)
38
+ - **Code:** [GitHub - MLC-SLM](https://github.com/1535176727/MLC-SLM)
39
+
40
+ ## Description
41
+
42
+ The proposed **Speech-LLM** is an enhanced framework that integrates fine-tuned Whisper and mHuBERT encoders with a Large Language Model (Qwen2.5-7B) to enrich speech representations for multilingual conversational ASR. It utilizes cross-attention-based fusion mechanisms to exploit complementary information between generative (Whisper) and discriminative (mHuBERT) speech features.
43
+
44
+ ## Results
45
+
46
+ Performance (CER/WER) on the MLC-SLM Challenge datasets:
47
 
 
48
  | **System** | **Dev** | **Eval** | **CV-Test** |
49
  |----------------------------|---------|----------|-------------|
50
  | Whisper (LoRA-fine-tuned) | 11.40 | 10.71 | **11.47** |
51
  | Whisper (Full-fine-tuned) | **10.99** | **10.07** | 13.11 |
52
+ | **Proposed Speech-LLM** | 11.74 | 10.69| 15.26 |
53
+
54
+ ## Dataset
55
+
56
+ The models were trained on the official ~1500h training set from the MLC-SLM Challenge, covering 11 languages and 15 categories (including various English accents).
57
+
58
+ ## Citation
59
+
60
+ ```bibtex
61
+ @article{mlcslm2025bridging,
62
+ title={Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR},
63
+ author={Anonymous Authors},
64
+ journal={arXiv preprint arXiv:2601.01461},
65
+ year={2025}
66
+ }
67
+ ```