CasanovaE commited on
Commit
c2caf64
·
verified ·
1 Parent(s): a9f3251

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -61,10 +61,8 @@ The model is available for use in the NeMo toolkit [4], and can be used as a pre
61
 
62
 
63
  ## Training, Testing, and Evaluation Datasets:
64
- The Low Frame-rate Speech Codec is trained on a total of 28.7k hrs of speech data from 105 languages.
65
 
66
- For training our model we have used [Common Voice](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) and an English subset of MLS dataset. The Common Voice derived training set comprises 105 languages, totaling 2.7 million utterances, and 3.2k hours
67
- of audio from about one-hundred thousand speakers. The [MLS English](https://www.openslr.org/94/) training dataset consists of 6.2 million utterances and 25.5k hours of audio from 4329 speakers. =
68
 
69
 
70
  ### Training Datasets
@@ -106,7 +104,7 @@ The Low Frame-rate Speech Codec is trained on a total of 28.7k hrs of speech dat
106
 
107
  - Labeling Method: Automated
108
 
109
- - Properties: We randomly selected 200 samples from each of the eight languages in the 44kHz MLS dataset. For more details, please refer to [our paper](https://arxiv.org/abs/2409.12117).
110
 
111
  - [DAPS](https://zenodo.org/records/4660670)
112
 
 
61
 
62
 
63
  ## Training, Testing, and Evaluation Datasets:
 
64
 
65
+ The Low Frame-rate Speech Codec was trained on 28.7k hours of speech data spanning 105 languages. The model was evaluated using multilingual audiobook-style data and high-quality English recordings. For further details, refer to [our paper](https://arxiv.org/abs/2409.12117).
 
66
 
67
 
68
  ### Training Datasets
 
104
 
105
  - Labeling Method: Automated
106
 
107
+ - Properties: We randomly selected 200 samples from each of the eight languages in the 44kHz MLS dataset.
108
 
109
  - [DAPS](https://zenodo.org/records/4660670)
110