Update README.md
Browse files
README.md
CHANGED
|
@@ -68,6 +68,36 @@ For detailed inference and full pipelines, refer to:
|
|
| 68 |
👉 [DiCoW GitHub inference repo](https://github.com/BUTSpeechFIT/DiCoW)
|
| 69 |
|
| 70 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
## Citation
|
| 73 |
|
|
|
|
| 68 |
👉 [DiCoW GitHub inference repo](https://github.com/BUTSpeechFIT/DiCoW)
|
| 69 |
|
| 70 |
|
| 71 |
+
### tcpWER/CER (%) on the MLC-SLM development set
|
| 72 |
+
|
| 73 |
+
| Language | Baseline (GT) | DiCoW (GT) | FT (GT) | Baseline (Real diar) | DiCoW (Real diar) | FT (Real diar) |
|
| 74 |
+
|----------------|---------------|------------|---------|-----------------------|-------------------|----------------|
|
| 75 |
+
| American En. | 14.1 | 20.6 | 11.1 | 53.7 | 36.5 | 22.5 |
|
| 76 |
+
| Australian En. | 11.7 | 19.4 | 7.4 | 52.6 | 23.6 | 13.0 |
|
| 77 |
+
| British En. | 10.1 | 16.7 | 7.7 | 71.9 | 26.1 | 17.6 |
|
| 78 |
+
| Filipino En. | 9.2 | 17.7 | 7.5 | 50.4 | 25.5 | 15.2 |
|
| 79 |
+
| Indian En. | 14.0 | 14.3 | 13.3 | 70.7 | 14.9 | 14.0 |
|
| 80 |
+
| French | 28.1 | 27.7 | 16.1 | 96.0 | 37.8 | 27.5 |
|
| 81 |
+
| German | 20.7 | 21.2 | 23.9 | 86.7 | 30.1 | 27.3 |
|
| 82 |
+
| Italian | 17.9 | 16.2 | 12.3 | 83.3 | 19.8 | 16.4 |
|
| 83 |
+
| Japanese (\*) | 21.6 | 19.2 | 13.7 | 71.3 | 25.8 | 23.3 |
|
| 84 |
+
| Korean (\*) | 13.8 | 12.8 | 8.5 | 59.6 | 24.5 | 22.8 |
|
| 85 |
+
| Portuguese | 21.2 | 24.5 | 19.5 | 118.8 | 33.1 | 29.7 |
|
| 86 |
+
| Russian | 17.7 | 17.6 | 11.6 | 69.2 | 22.5 | 16.7 |
|
| 87 |
+
| Spanish | 12.3 | 11.6 | 8.7 | 75.6 | 18.2 | 16.3 |
|
| 88 |
+
| Thai (\*) | 14.5 | 31.9 | 14.2 | 83.6 | 34.4 | 20.1 |
|
| 89 |
+
| Vietnamese | 27.2 | 30.0 | 15.3 | 82.8 | 33.8 | 24.7 |
|
| 90 |
+
| **Overall** | **16.8** | **22.0** | **12.9**| **76.1** | **28.4** | **20.8** |
|
| 91 |
+
|
| 92 |
+
> *Results marked with an asterisk (*) are reported using tcpCER, following the official evaluation protocol.*
|
| 93 |
+
|
| 94 |
+
**Notes:**
|
| 95 |
+
|
| 96 |
+
- GT = Ground-Truth Segmentation
|
| 97 |
+
- Real diar = Real Diarization
|
| 98 |
+
- Baseline uses Whisper large-v3 with chunked inference + finetunned Pyannote diarization.
|
| 99 |
+
- DiCoW uses fine-tuned DiariZen diarization.
|
| 100 |
+
|
| 101 |
|
| 102 |
## Citation
|
| 103 |
|