Acoustic and language models
Browse files
README.md
CHANGED
|
@@ -1,3 +1,42 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# **Acoustic and language models**
|
| 2 |
+
|
| 3 |
+
Acoustic model built using [QuartzNet15x5](https://arxiv.org/pdf/1910.10261.pdf) architecture and trained using [NeMo toolkit](https://github.com/NVIDIA/NeMo/tree/r1.0.0b4)
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
Three n-gram language models created using [KenLM Language Model Toolkit](https://kheafield.com/code/kenlm)
|
| 7 |
+
|
| 8 |
+
* LM built on [Common Crawl](https://commoncrawl.org) Russian dataset
|
| 9 |
+
* LM built on Golos train set
|
| 10 |
+
* LM built on [Common Crawl](https://commoncrawl.org) and Golos datasets together (50/50)
|
| 11 |
+
|
| 12 |
+
| Archives | Size | Links |
|
| 13 |
+
|--------------------------|------------|-----------------|
|
| 14 |
+
| QuartzNet15x5_golos.nemo | 68 MB | https://sc.link/ZMv |
|
| 15 |
+
| KenLMs.tar | 4.8 GB | https://sc.link/YL0 |
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
Golos data and models are also available in the hub of pre-trained models, datasets, and containers - DataHub ML Space. You can train the model and deploy it on the high-performance SberCloud infrastructure in [ML Space](https://sbercloud.ru/ru/aicloud/mlspace) - full-cycle machine learning development platform for DS-teams collaboration based on the Christofari Supercomputer.
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
## **Evaluation**
|
| 22 |
+
|
| 23 |
+
Percents of Word Error Rate for different test sets
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
| Decoder \ Test set | Crowd test | Farfield test | MCV<sup>1</sup> dev | MCV<sup>1</sup> test |
|
| 27 |
+
|-------------------------------------|-----------|----------|-----------|----------|
|
| 28 |
+
| Greedy decoder | 4.389 % | 14.949 % | 9.314 % | 11.278 % |
|
| 29 |
+
| Beam Search with Common Crawl LM | 4.709 % | 12.503 % | 6.341 % | 7.976 % |
|
| 30 |
+
| Beam Search with Golos train set LM | 3.548 % | 12.384 % | - | - |
|
| 31 |
+
| Beam Search with Common Crawl and Golos LM | 3.318 % | 11.488 % | 6.4 % | 8.06 % |
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
<sup>1</sup> [Common Voice](https://commonvoice.mozilla.org) - Mozilla's initiative to help teach machines how real people speak.
|
| 35 |
+
|
| 36 |
+
## **Resources**
|
| 37 |
+
|
| 38 |
+
[[arxiv.org] Golos: Russian Dataset for Speech Research](https://arxiv.org/abs/2106.10161)
|
| 39 |
+
|
| 40 |
+
[[habr.com] Golos — самый большой русскоязычный речевой датасет, размеченный вручную, теперь в открытом доступе](https://habr.com/ru/company/sberdevices/blog/559496/)
|
| 41 |
+
|
| 42 |
+
[[habr.com] Как улучшить распознавание русской речи до 3% WER с помощью открытых данных](https://habr.com/ru/company/sberdevices/blog/569082/)
|