Update evaluation results and README with new benchmarks

- Updated MER results: SEAME (0.2530), EMILIA (0.3046), CS-Dialogue (0.2541)
- Added 5 examples showing significant improvements over baseline
- Added link to original MERaLiON-2-3B model
- Updated eval_results with latest evaluation files

Files changed (6) hide show

README.md +49 -5
eval_results/baseline_cs_dialogue.json +0 -0
eval_results/baseline_emilia.json +0 -0
eval_results/trained_cs_dialogue.json +0 -0
eval_results/trained_emilia.json +0 -0
eval_results/trained_seame.json +0 -0

README.md CHANGED Viewed

@@ -26,21 +26,65 @@ A fine-tuned version of [MERaLiON/MERaLiON-2-3B](https://huggingface.co/MERaLiON
 | Benchmark | Baseline | This Model | Improvement |
 |-----------|----------|------------|-------------|
-| **SEAME** (Code-Switching) | 0.3372 | **0.2753** | **+18.4%** |
-| EMILIA | 0.5046 |  |  |
-| CS-Dialogue | 0.7082 |  |  |
 ### Benchmark Descriptions
 - **SEAME**: English-Mandarin code-switching conversational speech from Singapore/Malaysia (9,764 samples)
 - **EMILIA**: Synthetic code-switching evaluation set (1,000 samples)
 - **CS-Dialogue**: Code-switching dialogue evaluation set (359 samples)
 ## Training Configuration
 ### Model Architecture
 | Parameter | Value |
 |-----------|-------|
-| Base Model | MERaLiON/MERaLiON-2-3B |
 | Training Type | Full Fine-Tuning |
 | Total Parameters | ~3.47B |
 | Trainable Parameters | ~3.47B |
@@ -133,4 +177,4 @@ print(transcription)
 ## License
-This model inherits the license of the base MERaLiON-2-3B model.

 | Benchmark | Baseline | This Model | Improvement |
 |-----------|----------|------------|-------------|
+| **SEAME** | 0.3372 | **0.2530** | **-25.0%** |
+| **EMILIA** | 0.3201 | **0.3046** | **-4.8%** |
+| **CS-Dialogue** | 0.2258 | 0.2541 | +12.5% |
 ### Benchmark Descriptions
 - **SEAME**: English-Mandarin code-switching conversational speech from Singapore/Malaysia (9,764 samples)
 - **EMILIA**: Synthetic code-switching evaluation set (1,000 samples)
 - **CS-Dialogue**: Code-switching dialogue evaluation set (359 samples)
+## Examples
+Below are examples showing improvements from baseline to DPO-trained model:
+### Example 1: Hallucination Fixed (Valentine's Day)
+| | Transcription |
+|---|---|
+| **Ground Truth** | (呃) 我们 二月 多 有 valentine's day |
+| **Baseline** | ah moment ah month ah month ah month ah month... *(repeated 250+ times)* |
+| **This Model** | (呃) 我们二月多有 valentine's day |
+| **MER** | 56.89 → **0.00** |
+### Example 2: Repetition Fixed (Code-Switch Preserved)
+| | Transcription |
+|---|---|
+| **Ground Truth** | it's to give yourself 一个 台阶 right |
+| **Baseline** | You have to give yourself a a a a a a a a... *(repeated 500+ times)* |
+| **This Model** | is to give yourself 一个台阶 right |
+| **MER** | 56.56 → **0.11** |
+### Example 3: Code-Switching Preserved
+| | Transcription |
+|---|---|
+| **Ground Truth** | inside circle yah like 进出 进出 会 生病 的 leh |
+| **Baseline** | And you say so could yeah like you can you can you can... *(repeated 500+ times)* |
+| **This Model** | inside the circle ya like 进出进出会生病的 (leh) |
+| **MER** | 39.31 → **0.15** |
+### Example 4: Perfect Recovery from Repetition
+| | Transcription |
+|---|---|
+| **Ground Truth** | 有 有 有 有 有 有 control 有 有 有 他们 要 control |
+| **Baseline** | 有有有有有有有有有有有有有有有有有有有有... *(repeated 500+ times, no "control")* |
+| **This Model** | 有有有有有有 control 有有有他们要 control |
+| **MER** | 35.93 → **0.00** |
+### Example 5: Technical Terms Preserved
+| | Transcription |
+|---|---|
+| **Ground Truth** | 大部分 [哪] 大部分 是 triple e. 跟 computer en~ com~ computer [lah] |
+| **Baseline** | 大部分呐大部分是跟跟跟跟跟跟跟... *(repeated 500+ times, lost "triple e" and "computer")* |
+| **This Model** | 大部分 (啊) 大部分是 triple e 跟 computer (啊) computer (啦) |
+| **MER** | 31.56 → **0.25** |
 ## Training Configuration
 ### Model Architecture
 | Parameter | Value |
 |-----------|-------|
+| Base Model | [MERaLiON/MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) |
 | Training Type | Full Fine-Tuning |
 | Total Parameters | ~3.47B |
 | Trainable Parameters | ~3.47B |
 ## License
+This model inherits the license of the base [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) model.

eval_results/baseline_cs_dialogue.json CHANGED Viewed