myaccountfor commited on
Commit
b07f058
Β·
1 Parent(s): 1c87807

Update evaluation results and README with new benchmarks

Browse files

- Updated MER results: SEAME (0.2530), EMILIA (0.3046), CS-Dialogue (0.2541)
- Added 5 examples showing significant improvements over baseline
- Added link to original MERaLiON-2-3B model
- Updated eval_results with latest evaluation files

README.md CHANGED
@@ -26,21 +26,65 @@ A fine-tuned version of [MERaLiON/MERaLiON-2-3B](https://huggingface.co/MERaLiON
26
 
27
  | Benchmark | Baseline | This Model | Improvement |
28
  |-----------|----------|------------|-------------|
29
- | **SEAME** (Code-Switching) | 0.3372 | **0.2753** | **+18.4%** |
30
- | EMILIA | 0.5046 | | |
31
- | CS-Dialogue | 0.7082 | | |
32
 
33
  ### Benchmark Descriptions
34
  - **SEAME**: English-Mandarin code-switching conversational speech from Singapore/Malaysia (9,764 samples)
35
  - **EMILIA**: Synthetic code-switching evaluation set (1,000 samples)
36
  - **CS-Dialogue**: Code-switching dialogue evaluation set (359 samples)
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ## Training Configuration
39
 
40
  ### Model Architecture
41
  | Parameter | Value |
42
  |-----------|-------|
43
- | Base Model | MERaLiON/MERaLiON-2-3B |
44
  | Training Type | Full Fine-Tuning |
45
  | Total Parameters | ~3.47B |
46
  | Trainable Parameters | ~3.47B |
@@ -133,4 +177,4 @@ print(transcription)
133
 
134
  ## License
135
 
136
- This model inherits the license of the base MERaLiON-2-3B model.
 
26
 
27
  | Benchmark | Baseline | This Model | Improvement |
28
  |-----------|----------|------------|-------------|
29
+ | **SEAME** | 0.3372 | **0.2530** | **-25.0%** |
30
+ | **EMILIA** | 0.3201 | **0.3046** | **-4.8%** |
31
+ | **CS-Dialogue** | 0.2258 | 0.2541 | +12.5% |
32
 
33
  ### Benchmark Descriptions
34
  - **SEAME**: English-Mandarin code-switching conversational speech from Singapore/Malaysia (9,764 samples)
35
  - **EMILIA**: Synthetic code-switching evaluation set (1,000 samples)
36
  - **CS-Dialogue**: Code-switching dialogue evaluation set (359 samples)
37
 
38
+ ## Examples
39
+
40
+ Below are examples showing improvements from baseline to DPO-trained model:
41
+
42
+ ### Example 1: Hallucination Fixed (Valentine's Day)
43
+ | | Transcription |
44
+ |---|---|
45
+ | **Ground Truth** | (呃) ζˆ‘δ»¬ 二月 倚 ζœ‰ valentine's day |
46
+ | **Baseline** | ah moment ah month ah month ah month ah month... *(repeated 250+ times)* |
47
+ | **This Model** | (呃) ζˆ‘δ»¬δΊŒζœˆε€šζœ‰ valentine's day |
48
+ | **MER** | 56.89 β†’ **0.00** |
49
+
50
+ ### Example 2: Repetition Fixed (Code-Switch Preserved)
51
+ | | Transcription |
52
+ |---|---|
53
+ | **Ground Truth** | it's to give yourself δΈ€δΈͺ 台阢 right |
54
+ | **Baseline** | You have to give yourself a a a a a a a a... *(repeated 500+ times)* |
55
+ | **This Model** | is to give yourself δΈ€δΈͺ台阢 right |
56
+ | **MER** | 56.56 β†’ **0.11** |
57
+
58
+ ### Example 3: Code-Switching Preserved
59
+ | | Transcription |
60
+ |---|---|
61
+ | **Ground Truth** | inside circle yah like θΏ›ε‡Ί θΏ›ε‡Ί 会 η”Ÿη—… ηš„ leh |
62
+ | **Baseline** | And you say so could yeah like you can you can you can... *(repeated 500+ times)* |
63
+ | **This Model** | inside the circle ya like θΏ›ε‡ΊθΏ›ε‡ΊδΌšη”Ÿη—…ηš„ (leh) |
64
+ | **MER** | 39.31 β†’ **0.15** |
65
+
66
+ ### Example 4: Perfect Recovery from Repetition
67
+ | | Transcription |
68
+ |---|---|
69
+ | **Ground Truth** | ζœ‰ ζœ‰ ζœ‰ ζœ‰ ζœ‰ ζœ‰ control ζœ‰ ζœ‰ ζœ‰ 他们 要 control |
70
+ | **Baseline** | ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰... *(repeated 500+ times, no "control")* |
71
+ | **This Model** | ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ζœ‰ control ζœ‰ζœ‰ζœ‰δ»–δ»¬θ¦ control |
72
+ | **MER** | 35.93 β†’ **0.00** |
73
+
74
+ ### Example 5: Technical Terms Preserved
75
+ | | Transcription |
76
+ |---|---|
77
+ | **Ground Truth** | ε€§ιƒ¨εˆ† [ε“ͺ] ε€§ιƒ¨εˆ† 是 triple e. 跟 computer en~ com~ computer [lah] |
78
+ | **Baseline** | ε€§ιƒ¨εˆ†ε‘ε€§ιƒ¨εˆ†ζ˜―θ·Ÿθ·Ÿθ·Ÿθ·Ÿθ·Ÿθ·Ÿθ·Ÿ... *(repeated 500+ times, lost "triple e" and "computer")* |
79
+ | **This Model** | ε€§ιƒ¨εˆ† (ε•Š) ε€§ιƒ¨εˆ†ζ˜― triple e 跟 computer (ε•Š) computer (啦) |
80
+ | **MER** | 31.56 β†’ **0.25** |
81
+
82
  ## Training Configuration
83
 
84
  ### Model Architecture
85
  | Parameter | Value |
86
  |-----------|-------|
87
+ | Base Model | [MERaLiON/MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) |
88
  | Training Type | Full Fine-Tuning |
89
  | Total Parameters | ~3.47B |
90
  | Trainable Parameters | ~3.47B |
 
177
 
178
  ## License
179
 
180
+ This model inherits the license of the base [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) model.
eval_results/baseline_cs_dialogue.json CHANGED
The diff for this file is too large to render. See raw diff
 
eval_results/baseline_emilia.json CHANGED
The diff for this file is too large to render. See raw diff
 
eval_results/trained_cs_dialogue.json CHANGED
The diff for this file is too large to render. See raw diff
 
eval_results/trained_emilia.json CHANGED
The diff for this file is too large to render. See raw diff
 
eval_results/trained_seame.json CHANGED
The diff for this file is too large to render. See raw diff