| # Fast speech 2 multi-speaker english lang based | |
| ## Prepare | |
| Everything is done from main repo folder so TensorflowTTS/ | |
| 0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb) | |
| - Dataset structure after finish this step: | |
| ``` | |
| |- TensorFlowTTS/ | |
| | |- LibriTTS/ | |
| | |- |- train-clean-100/ | |
| | |- |- SPEAKERS.txt | |
| | |- |- ... | |
| | |- libritts/ | |
| | |- |- 200/ | |
| | |- |- |- 200_124139_000001_000000.txt | |
| | |- |- |- 200_124139_000001_000000.wav | |
| | |- |- |- ... | |
| | |- |- 250/ | |
| | |- |- ... | |
| | |- tensorflow_tts/ | |
| | |- models/ | |
| | |- ... | |
| ``` | |
| 1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2) | |
| 2. Optional* build docker | |
| - ``` | |
| bash examples/fastspeech2_libritts/scripts/build.sh | |
| ``` | |
| 3. Optional* run docker | |
| - ``` | |
| bash examples/fastspeech2_libritts/scripts/interactive.sh | |
| ``` | |
| 4. Preprocessing: | |
| - ``` | |
| tensorflow-tts-preprocess --rootdir ./libritts \ | |
| --outdir ./dump_libritts \ | |
| --config preprocess/libritts_preprocess.yaml \ | |
| --dataset libritts | |
| ``` | |
| 5. Normalization: | |
| - ``` | |
| tensorflow-tts-normalize --rootdir ./dump_libritts \ | |
| --outdir ./dump_libritts \ | |
| --config preprocess/libritts_preprocess.yaml \ | |
| --dataset libritts | |
| ``` | |
| 6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything) | |
| 7. Change train_libri.sh to match your dataset and run: | |
| - ``` | |
| bash examples/fastspeech2_libritts/scripts/train_libri.sh | |
| ``` | |
| 8. Optional* If u have problems with tensor sizes mismatch check step 5 in `examples/mfa_extraction` directory | |
| ## Comments | |
| This version is using popular train.txt '|' split used in other repos. Training files should looks like this => | |
| Wav Path | Text | Speaker Name | |
| Wav Path2 | Text | Speaker Name | |