Added references for the ParlaSpeech-HR dataset
#1
by
nljubesi - opened
README.md
CHANGED
|
@@ -97,7 +97,7 @@ Full config can be found inside the `.nemo` files.
|
|
| 97 |
|
| 98 |
### Datasets
|
| 99 |
|
| 100 |
-
All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
|
| 101 |
|
| 102 |
## Performance
|
| 103 |
|
|
@@ -130,4 +130,8 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
|
|
| 130 |
|
| 131 |
- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
| 132 |
|
| 133 |
-
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
### Datasets
|
| 99 |
|
| 100 |
+
All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
|
| 101 |
|
| 102 |
## Performance
|
| 103 |
|
|
|
|
| 130 |
|
| 131 |
- [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
|
| 132 |
|
| 133 |
+
- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
| 134 |
+
|
| 135 |
+
- [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
|
| 136 |
+
|
| 137 |
+
- [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)
|