BUT-FIT
/

DiCoW_v3_MLC

Automatic Speech Recognition

speaker-diarization

meeting-transcription

Model card Files Files and versions

Lakoc commited on Jun 20, 2025

Commit

989b69f

·

verified ·

1 Parent(s): bd2e2ab

Update README.md

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -68,6 +68,36 @@ For detailed inference and full pipelines, refer to:
 👉 [DiCoW GitHub inference repo](https://github.com/BUTSpeechFIT/DiCoW)
 ## Citation

 👉 [DiCoW GitHub inference repo](https://github.com/BUTSpeechFIT/DiCoW)
+### tcpWER/CER (%) on the MLC-SLM development set
+| Language       | Baseline (GT) | DiCoW (GT) | FT (GT) | Baseline (Real diar) | DiCoW (Real diar) | FT (Real diar) |
+|----------------|---------------|------------|---------|-----------------------|-------------------|----------------|
+| American En.   | 14.1          | 20.6       | 11.1    | 53.7                  | 36.5              | 22.5           |
+| Australian En. | 11.7          | 19.4       | 7.4     | 52.6                  | 23.6              | 13.0           |
+| British En.    | 10.1          | 16.7       | 7.7     | 71.9                  | 26.1              | 17.6           |
+| Filipino En.   | 9.2           | 17.7       | 7.5     | 50.4                  | 25.5              | 15.2           |
+| Indian En.     | 14.0          | 14.3       | 13.3    | 70.7                  | 14.9              | 14.0           |
+| French         | 28.1          | 27.7       | 16.1    | 96.0                  | 37.8              | 27.5           |
+| German         | 20.7          | 21.2       | 23.9    | 86.7                  | 30.1              | 27.3           |
+| Italian        | 17.9          | 16.2       | 12.3    | 83.3                  | 19.8              | 16.4           |
+| Japanese (\*)   | 21.6          | 19.2       | 13.7    | 71.3                  | 25.8              | 23.3           |
+| Korean (\*)     | 13.8          | 12.8       | 8.5     | 59.6                  | 24.5              | 22.8           |
+| Portuguese     | 21.2          | 24.5       | 19.5    | 118.8                 | 33.1              | 29.7           |
+| Russian        | 17.7          | 17.6       | 11.6    | 69.2                  | 22.5              | 16.7           |
+| Spanish        | 12.3          | 11.6       | 8.7     | 75.6                  | 18.2              | 16.3           |
+| Thai (\*)       | 14.5          | 31.9       | 14.2    | 83.6                  | 34.4              | 20.1           |
+| Vietnamese     | 27.2          | 30.0       | 15.3    | 82.8                  | 33.8              | 24.7           |
+| **Overall**    | **16.8**      | **22.0**   | **12.9**| **76.1**              | **28.4**          | **20.8**       |
+> *Results marked with an asterisk (*) are reported using tcpCER, following the official evaluation protocol.*
+**Notes:**
+- GT = Ground-Truth Segmentation
+- Real diar = Real Diarization
+- Baseline uses Whisper large-v3 with chunked inference + finetunned Pyannote diarization.
+- DiCoW uses fine-tuned DiariZen diarization.
 ## Citation