Instructions to use sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5")# Load model directly from transformers import AutoProcessor, AutoModelForTextToSpectrogram processor = AutoProcessor.from_pretrained("sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5") model = AutoModelForTextToSpectrogram.from_pretrained("sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5") - Notebooks
- Google Colab
- Kaggle
bap-chapter-audio-dataset-force-aligned-speecht5
This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3966
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 3407
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 4000
- training_steps: 40000
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.4925 | 11.7647 | 1000 | 0.4308 |
| 0.443 | 23.5294 | 2000 | 0.4077 |
| 0.4186 | 35.2941 | 3000 | 0.3994 |
| 0.4148 | 47.0588 | 4000 | 0.3940 |
| 0.3967 | 58.8235 | 5000 | 0.3910 |
| 0.3953 | 70.5882 | 6000 | 0.3907 |
| 0.3835 | 82.3529 | 7000 | 0.3913 |
| 0.3715 | 94.1176 | 8000 | 0.3893 |
| 0.3778 | 105.8824 | 9000 | 0.3890 |
| 0.3706 | 117.6471 | 10000 | 0.3880 |
| 0.3572 | 129.4118 | 11000 | 0.3868 |
| 0.3658 | 141.1765 | 12000 | 0.3885 |
| 0.3581 | 152.9412 | 13000 | 0.3893 |
| 0.3564 | 164.7059 | 14000 | 0.3904 |
| 0.3495 | 176.4706 | 15000 | 0.3879 |
| 0.3535 | 188.2353 | 16000 | 0.3884 |
| 0.3495 | 200.0 | 17000 | 0.3890 |
| 0.3593 | 211.7647 | 18000 | 0.3907 |
| 0.345 | 223.5294 | 19000 | 0.3910 |
| 0.3464 | 235.2941 | 20000 | 0.3909 |
| 0.3363 | 247.0588 | 21000 | 0.3920 |
| 0.3422 | 258.8235 | 22000 | 0.3917 |
| 0.3383 | 270.5882 | 23000 | 0.3930 |
| 0.3364 | 282.3529 | 24000 | 0.3927 |
| 0.3334 | 294.1176 | 25000 | 0.3937 |
| 0.3337 | 305.8824 | 26000 | 0.3943 |
| 0.3251 | 317.6471 | 27000 | 0.3932 |
| 0.3247 | 329.4118 | 28000 | 0.3944 |
| 0.3294 | 341.1765 | 29000 | 0.3951 |
| 0.3293 | 352.9412 | 30000 | 0.3954 |
| 0.3455 | 364.7059 | 31000 | 0.3978 |
| 0.3229 | 376.4706 | 32000 | 0.3962 |
| 0.3201 | 388.2353 | 33000 | 0.3966 |
| 0.3249 | 400.0 | 34000 | 0.3969 |
| 0.3203 | 411.7647 | 35000 | 0.3965 |
| 0.3173 | 423.5294 | 36000 | 0.3964 |
| 0.3223 | 435.2941 | 37000 | 0.3968 |
| 0.3183 | 447.0588 | 38000 | 0.3972 |
| 0.3218 | 458.8235 | 39000 | 0.3967 |
| 0.3262 | 470.5882 | 40000 | 0.3966 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- -
Model tree for sil-ai/bap-chapter-audio-dataset-force-aligned-speecht5
Base model
microsoft/speecht5_tts