| | --- |
| | language: |
| | - en |
| | base_model: |
| | - coqui/XTTS-v2 |
| | pipeline_tag: text-to-speech |
| | --- |
| | |
| | # RobCaamano/Sherlock-Holmes |
| |
|
| | This model is a fine-tuned version of [XTTS-v2](https://huggingface.co/coqui/XTTS-v2) on a custom dataset. |
| |
|
| | ## Training and evaluation data |
| |
|
| | Audio collected from [audiobook](https://www.youtube.com/watch?v=84VVFgGmeSA&ab_channel=JustFreeAudiobooks). |
| |
|
| | ### Training hyperparameters |
| |
|
| | The following hyperparameters were used during training: |
| |
|
| | GPTArgs(): |
| | - max_conditioning_length=143677 |
| | - min_conditioning_length=66150 |
| | - max_wav_length=255995 |
| | - max_text_length=66150 |
| | - gpt_use_masking_gt_prompt_approach=True |
| | - gpt_use_perceiver_resampler=True |
| |
|
| | GPTTrainerConfig(): |
| | - BATCH_SIZE=3 |
| | - batch_group_size=48 |
| | - GRAD_ACUMM_STEPS=84 |
| | - optimizer_params={"betas": [0.9, 0.96], "eps": 1e-8, "weight_decay": 1e-2} |
| | - lr_scheduler_params={"milestones": [50000 * 18, 150000 * 18, 300000 * 18], "gamma": 0.5, "last_epoch": -1} |
| |
|
| | ### Training results |
| |
|
| | - Train avg_loss: 0.05243 |
| | - Train avg_loss_mel_ce: 4.38085 |
| | - Train avg_loss_text_ce: 0.02336 |
| | - Validation avg_loss: 4.1927 |
| | - Validation avg_loss_mel_ce: 4.17117 |
| | - Validation avg_loss_text_ce: 0.02153 |
| | - Epoch: 1 |
| |
|
| | ### Framework versions |
| |
|
| | - Transformers 4.51.3 |
| | - PyTorch 2.6.0+cu126 |
| | - TorchAudio 2.6.0 |
| | - Tokenizers 0.21.1 |
| | - [TTS](https://github.com/idiap/coqui-ai-TTS) |