You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

F5-TTS โ€” Gujarati fine-tune

A fine-tuned F5-TTS model for Gujarati text-to-speech using flow matching.

Model details

Attribute Value
Base model SWivid/F5-TTS (F5TTS_v1_Base)
Architecture Diffusion Transformer with ConvNeXt V2
Language Gujarati (gu)
Training 150K steps (21 epochs) on NVIDIA L4 (24 GB)
Tokenizer Custom (extended for Gujarati characters)

Training data

Fine-tuned on Gujarati clips from Arjun4707/gu-hi-tts (~36K clips after CPS + duration filtering).

Data source: Audio clips scraped from publicly available YouTube videos. Preprocessed to 24kHz mono PCM-16, silence-trimmed, peak-normalized to -3 dBFS.

Known limitations

  • Single-speaker clips produce good quality; multi-speaker clips in training data caused stopping/blabbering artifacts
  • Total generation length (prompt + generated) capped at ~30 seconds

Training code

Full training pipeline and troubleshooting: BhammarArjun/TTS_2_training

License

CC-BY-NC-4.0 โ€” Non-commercial use only.

The base F5-TTS model is CC-BY-NC-4.0 (trained on the Emilia in-the-wild dataset). Our fine-tuning data was also sourced from YouTube audio.

Citation

@misc{arjun2026f5ttsgu,
  title   = {F5-TTS fine-tuned for Gujarati},
  author  = {Arjun Bhammar},
  year    = {2026},
  url     = {https://huggingface.co/Arjun4707/F5-TTS-Gujarati}
}

Acknowledgements

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Arjun4707/F5-TTS-Gujarati

Base model

SWivid/F5-TTS
Finetuned
(85)
this model