You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Chatterbox — Gujarati fine-tune

A fine-tuned Chatterbox TTS model for Gujarati text-to-speech with voice cloning and emotion control.

Model details

Attribute	Value
Base model	ResembleAI/chatterbox (Multilingual)
Architecture	CosyVoice 2.0 — LLaMA-based T3 (0.5B params)
Language	Gujarati (`gu`)
Training hardware	NVIDIA L4 (24 GB VRAM)
Fine-tuning repo	gokhaneraslan/chatterbox-finetuning
Vocab extension	2454 → 2514 tokens (+60 Gujarati characters)

Training data

Fine-tuned on Gujarati clips from Arjun4707/gu-hi-tts (~33,851 clips after CPS + diarization filtering).

Data source: Audio clips scraped from publicly available YouTube videos. Preprocessed with speaker diarization (pyannote-audio) to keep only single-speaker clips, CPS filtered (4-25 chars/sec), duration filtered (2-20s).

Known limitations

Very short utterances (1-3 words) produce poor quality — architecture needs minimum ~5 words
Medium to long sentences (5-30 words) produce good quality with clear Gujarati pronunciation
Not suitable for sub-2s audio generation

Training code

Full training pipeline and troubleshooting: BhammarArjun/TTS_4_training

License

CC-BY-NC-4.0 — Non-commercial use only.

The base Chatterbox model is MIT-licensed, but this fine-tuned version uses YouTube-sourced training data. To be transparent and responsible about data provenance, we apply CC-BY-NC-4.0 to this fine-tuned version.

Citation

@misc{arjun2026chatterboxgu,
  title   = {Chatterbox fine-tuned for Gujarati},
  author  = {Arjun Bhammar},
  year    = {2026},
  url     = {https://huggingface.co/Arjun4707/chatterbox-gujarati}
}

Acknowledgements

Resemble AI for the Chatterbox TTS model
gokhaneraslan for the fine-tuning framework

Downloads last month: -

Model tree for Arjun4707/chatterbox-gujarati

Base model

ResembleAI/chatterbox

Finetuned

(58)

this model