Update README.md
Browse files
README.md
CHANGED
|
@@ -61,10 +61,8 @@ The model is available for use in the NeMo toolkit [4], and can be used as a pre
|
|
| 61 |
|
| 62 |
|
| 63 |
## Training, Testing, and Evaluation Datasets:
|
| 64 |
-
The Low Frame-rate Speech Codec is trained on a total of 28.7k hrs of speech data from 105 languages.
|
| 65 |
|
| 66 |
-
|
| 67 |
-
of audio from about one-hundred thousand speakers. The [MLS English](https://www.openslr.org/94/) training dataset consists of 6.2 million utterances and 25.5k hours of audio from 4329 speakers. =
|
| 68 |
|
| 69 |
|
| 70 |
### Training Datasets
|
|
@@ -106,7 +104,7 @@ The Low Frame-rate Speech Codec is trained on a total of 28.7k hrs of speech dat
|
|
| 106 |
|
| 107 |
- Labeling Method: Automated
|
| 108 |
|
| 109 |
-
- Properties: We randomly selected 200 samples from each of the eight languages in the 44kHz MLS dataset.
|
| 110 |
|
| 111 |
- [DAPS](https://zenodo.org/records/4660670)
|
| 112 |
|
|
|
|
| 61 |
|
| 62 |
|
| 63 |
## Training, Testing, and Evaluation Datasets:
|
|
|
|
| 64 |
|
| 65 |
+
The Low Frame-rate Speech Codec was trained on 28.7k hours of speech data spanning 105 languages. The model was evaluated using multilingual audiobook-style data and high-quality English recordings. For further details, refer to [our paper](https://arxiv.org/abs/2409.12117).
|
|
|
|
| 66 |
|
| 67 |
|
| 68 |
### Training Datasets
|
|
|
|
| 104 |
|
| 105 |
- Labeling Method: Automated
|
| 106 |
|
| 107 |
+
- Properties: We randomly selected 200 samples from each of the eight languages in the 44kHz MLS dataset.
|
| 108 |
|
| 109 |
- [DAPS](https://zenodo.org/records/4660670)
|
| 110 |
|