| | --- |
| | language: en |
| | tags: |
| | - text-to-speech |
| | - StyleTTS2 |
| | - speech-synthesis |
| | license: mit |
| | pipeline_tag: text-to-speech |
| | --- |
| | |
| | # StyleTTS2 Fine-tuned Model |
| |
|
| | This model is a fine-tuned version of StyleTTS2, containing all necessary components for inference. |
| |
|
| | ## Model Details |
| | - **Base Model:** StyleTTS2-LibriTTS |
| | - **Architecture:** StyleTTS2 |
| | - **Task:** Text-to-Speech |
| | - **Last Checkpoint:** epoch_2nd_00014.pth |
| |
|
| | ## Training Details |
| | - **Total Epochs:** 30 |
| | - **Completed Epochs:** 14 |
| | - **Total Iterations:** 1169 |
| | - **Batch Size:** 2 |
| | - **Max Length:** 120 |
| | - **Learning Rate:** 0.0001 |
| | - **Final Validation Loss:** 0.418901 |
| |
|
| | ## Model Components |
| | The repository includes all necessary components for inference: |
| |
|
| | ### Main Model Components: |
| | - bert.pth |
| | - bert_encoder.pth |
| | - predictor.pth |
| | - decoder.pth |
| | - text_encoder.pth |
| | - predictor_encoder.pth |
| | - style_encoder.pth |
| | - diffusion.pth |
| | - text_aligner.pth |
| | - pitch_extractor.pth |
| | - mpd.pth |
| | - msd.pth |
| | - wd.pth |
| |
|
| | ### Utility Components: |
| | - ASR (Automatic Speech Recognition) |
| | - epoch_00080.pth |
| | - config.yml |
| | - models.py |
| | - layers.py |
| | - JDC (F0 Prediction) |
| | - bst.t7 |
| | - model.py |
| | - PLBERT |
| | - step_1000000.t7 |
| | - config.yml |
| | - util.py |
| |
|
| | ### Additional Files: |
| | - text_utils.py: Text preprocessing utilities |
| | - models.py: Model architecture definitions |
| | - utils.py: Utility functions |
| | - config.yml: Model configuration |
| | - config.json: Detailed configuration and training metrics |
| | |
| | ## Training Metrics |
| | Training metrics visualization is available in training_metrics.png |
| |
|
| | ## Directory Structure |
| | βββ Utils/ |
| | β βββ ASR/ |
| | β βββ JDC/ |
| | β βββ PLBERT/ |
| | βββ model_components/ |
| | βββ configs/ |
| | |
| | ## Usage Instructions |
| | 1. Load the model using the provided config.yml |
| | 2. Ensure all utility components (ASR, JDC, PLBERT) are in their respective directories |
| | 3. Use text_utils.py for text preprocessing |
| | 4. Follow the inference example in the StyleTTS2 documentation |
| |
|