--- library_name: transformers language: - en license: mit base_model: microsoft/speecht5_tts tags: - generated_from_trainer datasets: - custom model-index: - name: 'SpeechT5 TTS technical train2 ' results: [] --- | **PAGE** | **LINK** | |-------------------------------------|------------------------------------------------------------------------------------------------------| | **MARATHI TTS GITHUB LINK LINK** | [MARATHI TTS REPO](https://github.com/dawarepranav/speechT5_marathi_finetuned-) | | **HUGGING FACE ENG TECHNICAL DATA** | [HUGGING FACE TECHNICAL DATA ](https://huggingface.co/pranavdaware/speecht5_tts_technical_train2) | | **HUGGING FACE MARATHI TTS** | [HUGGING FACE MARATHI TTS ](https://huggingface.co/pranavdaware/speecht5_tts_marathi_train2) | | **REPORT** | [REPORT](https://github.com/dawarepranav/speecht5_tts_english_technical_data/blob/main/A%20Technical%20Report.docx) | # 🎀 SpeechT5 TTS Technical Train2 This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) using a custom dataset, specifically trained for *Text-to-Speech (TTS)* tasks. 🎯 *Key Metric:* - *Loss* on the evaluation set: 0.3763 πŸ“’ *Listen to the generated sample:* The text is " Hello ,few technical terms i used while fine tuning are API and REST andΒ CUDAΒ andΒ TTS." --- ## πŸ“ Model Description The *SpeechT5 TTS Technical Train2* is built on the *SpeechT5* architecture and was fine-tuned for speech synthesis (TTS). The fine-tuning focused on improving the naturalness and clarity of the generated audio from text. πŸ›  *Base Model*: [Microsoft SpeechT5](https://huggingface.co/microsoft/speecht5_tts) πŸ“š *Dataset*: Custom (specific details to be provided) --- ## πŸ”§ Intended Uses & Limitations ### βœ… *Primary Use Cases:* - *Text-to-Speech (TTS)* for technical Interview Texts . - *Virtual Assistants*: ### ⚠ *Limitations:* - Best suited for English TTS tasks. - Require further fine-tuning on Large dataset . --- ## πŸ“… Training Data The model was fine-tuned on a *custom dataset*, curated for enhancing TTS outputs. This dataset consists of various types of text that help the model generate more natural speech, making it suitable for TTS applications. ### βš™ *Hyperparameters:* The model was trained with the following hyperparameters: - *Learning Rate*: 1e-05 - *Train Batch Size*: 16 - *Eval Batch Size*: 8 - *Seed*: 42 - *Gradient Accumulation Steps*: 2 - *Total Train Batch Size*: 32 - *Optimizer*: AdamW (betas=(0.9, 0.999), epsilon=1e-08) - *LR Scheduler Type*: Linear - *Warmup Steps*: 50 - *Training Steps*: 500 - *Mixed Precision Training*: Native AMP ### βš™ *πŸ“Š Training Results:*: | πŸ‹β€β™‚ Training Loss | πŸ•‘ Epoch | πŸ›€ Step | πŸ“‰ Validation Loss | |:-------------------:|:-------:|:-------:|:-----------------:| | 1.1921 | 100.0 | 100 | 0.4136 | | 0.8435 | 200.0 | 200 | 0.3791 | | 0.8294 | 300.0 | 300 | 0.3766 | | 0.7959 | 400.0 | 400 | 0.3744 | | 0.7918 | 500.0 | 500 | 0.3763 | ### πŸ“¦ Framework Versions - *Transformers*: 4.46.0.dev0 - *PyTorch*: 2.4.1+cu121 - *Datasets*: 3.0.2 - *Tokenizers*:Β 0.20.1