| --- |
| license: mit |
| language: |
| - bn |
| - hi |
| - mr |
| - gu |
| - ta |
| - te |
| base_model: ResembleAI/chatterbox |
| tags: |
| - text-to-speech |
| - tts |
| - bengali |
| - hindi |
| - marathi |
| - gujarati |
| - tamil |
| - telugu |
| - indic |
| - chatterbox |
| - fine-tuned |
| - zero-shot-tts |
| - speech |
| - speech-synthesis |
| datasets: |
| - ai4bharat/Shrutilipi |
| - ai4bharat/Rasa |
| - SPRINGLab/IndicTTS_Bengali |
| - SPRINGLab/IndicTTS_Gujarati |
| - SPRINGLab/IndicTTS_Marathi |
| - SPRINGLab/IndicTTS_Tamil |
| - SPRINGLab/IndicTTS_Telugu |
| --- |
| |
| # ChatterBox Desi |
|
|
| A fine-tuned version of [ResembleAI/chatterbox](https://huggingface.co/ResembleAI/chatterbox) for **6 Indic languages** text-to-speech synthesis: Bengali, Hindi, Marathi, Gujarati, Tamil, and Telugu. |
|
|
| ## Zero-shot TTS Output |
|
|
| | Language | Reference | Output | Text | |
| |--|--|--|--| |
| | Bengali (বাংলা) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_bn.wav" style="width: 100px;"></audio>|আমরা কেউ মাষ্টার হতে চেয়েছিলাম, কেউ ডাক্তার, কেউ উকিল। অমলকান্তি সে-সব কিছু হতে চায়নি। সে রোদ্দুর হতে চেয়েছিল!| |
| | Bengali (বাংলা) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_bn.wav" style="width: 100px;"></audio>|আমরা কেউ মাষ্টার হতে চেয়েছিলাম, কেউ ডাক্তার, কেউ উকিল। অমলকান্তি সে-সব কিছু হতে চায়নি। সে রোদ্দুর হতে চেয়েছিল!| |
| | Hindi (हिंदी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_hi.wav" style="width: 100px;"></audio>|हम में से कुछ मास्टर बनना चाहते थे, कुछ डॉक्टर, कुछ वकील। अमलकांति उन सब कुछ बनना नहीं चाहता था। वह धूप बनना चाहता था!| |
| | Hindi (हिंदी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_hi.wav" style="width: 100px;"></audio>|हम में से कुछ मास्टर बनना चाहते थे, कुछ डॉक्टर, कुछ वकील। अमलकांति उन सब कुछ बनना नहीं चाहता था। वह धूप बनना चाहता था!| |
| | Marathi (मराठी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_mr.wav" style="width: 100px;"></audio>|आम्ही कोणीतरी मास्टर होऊ इच्छित होतो, कोणीतरी डॉक्टर, कोणीतरी वकील. अमलकांती त्या सगळ्या काही होऊ इच्छित नव्हता. तो सूर्य होऊ इच्छित होता!| |
| | Marathi (मराठी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_mr.wav" style="width: 100px;"></audio>|आम्ही कोणीतरी मास्टर होऊ इच्छित होतो, कोणीतरी डॉक्टर, कोणीतरी वकील. अमलकांती त्या सगळ्या काही होऊ इच्छित नव्हता. तो सूर्य होऊ इच्छित होता!| |
| | Gujarati (ગુજરાતી) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_gu.wav" style="width: 100px;"></audio>|અમલકાંતિ તે બધું બનવું નથી માંગતો હતો. તે ધૂપ બનવું માંગતો હતો!| |
| | Gujarati (ગુજરાતી) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_gu.wav" style="width: 100px;"></audio>|અમલકાંતિ તે બધું બનવું નથી માંગતો હતો. તે ધૂપ બનવું માંગતો હતો!| |
| | Tamil (தமிழ்) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_ta.wav" style="width: 100px;"></audio>|நாங்கள் யாரும் மாஸ்டர் ஆக விரும்பவில்லை, யாரும் டாக்டர் ஆக விரும்பவில்லை, யாரும் வக்கீல் ஆக விரும்பவில்லை. அமல்காந்தி அந்த எல்லாவற்றையும் ஆக விரும்பவில்லை. அவன் வெயிலாக இருக்க விரும்பினான்!| |
| | Tamil (தமிழ்) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_ta.wav" style="width: 100px;"></audio>|நாங்கள் யாரும் மாஸ்டர் ஆக விரும்பவில்லை, யாரும் டாக்டர் ஆக விரும்பவில்லை, யாரும் வக்கீல் ஆக விரும்பவில்லை. அமல்காந்தி அந்த எல்லாவற்றையும் ஆக விரும்பவில்லை. அவன் வெயிலாக இருக்க விரும்பினான்!| |
| | Telugu (తెలుగు) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_te.wav" style="width: 100px;"></audio>|మనలో కొందరు మాస్టర్ కావాలని కోరుకున్నారు, కొందరు డాక్టర్ కావాలని కోరుకున్నారు, కొందరు వకీల్ కావాలని కోరుకున్నారు. అమల్కాంతి ఆ అన్ని కావాలని కోరుకోలేదు. అతను సూర్యుడిగా ఉండాలని కోరుకున్నాడు!| |
| | Telugu (తెలుగు) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_te.wav" style="width: 100px;"></audio>|మనలో కొందరు మాస్టర్ కావాలని కోరుకున్నారు, కొందరు డాక్టర్ కావాలని కోరుకున్నారు, కొందరు వకీల్ కావాలని కోరుకున్నారు. అమల్కాంతి ఆ అన్ని కావాలని కోరుకోలేదు. అతను సూర్యుడిగా ఉండాలని కోరుకున్నాడు!| |
|
|
| ## Model Details |
|
|
| - **Base model**: ResembleAI/chatterbox — multilingual ChatterBox (supports 23 languages) |
| - **Fine-tuned on**: 6 Indic language speech corpus (~424 hours, 216,819 samples) |
| - ai4bharat/Shrutilipi (Bengali, Hindi splits) |
| - ai4bharat/Rasa (Bengali, Hindi, Marathi, Gujarati, Tamil, Telugu splits) |
| - SPRINGLab/IndicTTS (Bengali, Gujarati, Marathi, Tamil, Telugu) |
| - **Training steps**: 10,000 |
| - **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder |
| - **Vocabulary**: Extended from 2,530 → 2,820 tokens to cover all 6 Indic scripts |
| - **Language tagging**: Text must be prefixed with language tag (e.g. `[bn]`, `[hi]`, `[mr]`, `[gu]`, `[ta]`, `[te]`) |
|
|
| ### Training Data by Language |
|
|
| | Language | Code | Samples | Hours | |
| |----------|------|---------|-------| |
| | Bengali | bn | 58,820 | 99.95 | |
| | Gujarati | gu | 32,604 | 73.17 | |
| | Hindi | hi | 12,116 | 21.55 | |
| | Marathi | mr | 37,899 | 72.70 | |
| | Tamil | ta | 39,437 | 72.74 | |
| | Telugu | te | 35,943 | 84.04 | |
| | **Total**| | **216,819** | **424.15** | |
|
|
| ## Requirements |
|
|
| ```bash |
| git clone https://github.com/gokhaneraslan/chatterbox-finetuning |
| cd chatterbox-finetuning |
| pip install -r requirements.txt |
| ``` |
|
|
| ### One-time patch (upstream vocab resize fix) |
|
|
| The upstream `chatterbox-finetuning` repo initialises T3 with a hard-coded 704-token vocabulary, which causes a size mismatch when loading this model (vocab=2820). Apply this one-line fix before running inference: |
|
|
| ```bash |
| # Run from inside the cloned chatterbox-finetuning directory |
| python - <<'EOF' |
| import re, pathlib |
| f = pathlib.Path("src/chatterbox_/tts.py") |
| txt = f.read_text() |
| old = " t3 = T3()\n t3_state = load_file(ckpt_dir / \"t3_cfg.safetensors\")" |
| new = ( |
| " t3_state = load_file(ckpt_dir / \"t3_cfg.safetensors\")\n" |
| " from .models.t3.modules.t3_config import T3Config\n" |
| " t3 = T3(hp=T3Config(text_tokens_dict_size=t3_state[\"text_emb.weight\"].shape[0]))" |
| ) |
| f.write_text(txt.replace(old, new)) |
| print("Patched tts.py") |
| EOF |
| ``` |
|
|
| ## Usage |
|
|
| **Important**: Text must be prefixed with a language tag: `[bn]`, `[hi]`, `[mr]`, `[gu]`, `[ta]`, or `[te]`. |
|
|
| ```python |
| import sys |
| sys.path.insert(0, "/path/to/chatterbox-finetuning") |
| |
| from huggingface_hub import snapshot_download |
| from src.chatterbox_.tts import ChatterboxTTS |
| import torchaudio |
| |
| model_dir = snapshot_download("BosonLab/chatterbox-desi") |
| model = ChatterboxTTS.from_local(model_dir, device="cuda") |
| |
| # Bengali |
| text = "[bn] আমি বাংলায় কথা বলতে পারি। এটি একটি পরীক্ষামূলক বাক্য।" |
| wav = model.generate(text) |
| torchaudio.save("output_bn.wav", wav, model.sr) |
| |
| # Hindi |
| text = "[hi] मैं हिंदी में बोल सकता हूँ। यह एक परीक्षण वाक्य है।" |
| wav = model.generate(text) |
| torchaudio.save("output_hi.wav", wav, model.sr) |
| |
| # Marathi |
| text = "[mr] मी मराठीत बोलू शकतो. हे एक चाचणी वाक्य आहे." |
| wav = model.generate(text) |
| torchaudio.save("output_mr.wav", wav, model.sr) |
| |
| # Gujarati |
| text = "[gu] હું ગુજરાતીમાં બોલી શકું છું. આ એક પ્રાયોગિક વાક્ય છે." |
| wav = model.generate(text) |
| torchaudio.save("output_gu.wav", wav, model.sr) |
| |
| # Tamil |
| text = "[ta] நான் தமிழில் பேச முடியும். இது ஒரு சோதனை வாக்கியம்." |
| wav = model.generate(text) |
| torchaudio.save("output_ta.wav", wav, model.sr) |
| |
| # Telugu |
| text = "[te] నేను తెలుగులో మాట్లాడగలను. ఇది ఒక పరీక్ష వాక్యం." |
| wav = model.generate(text) |
| torchaudio.save("output_te.wav", wav, model.sr) |
| ``` |
|
|
| ## With Voice Cloning |
|
|
| ```python |
| wav = model.generate(text, audio_prompt_path="reference.wav") |
| ``` |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (6 Indic langs, vocab=2820) | |
| | `s3gen.safetensors` | Speech codec decoder (unchanged from base) | |
| | `ve.safetensors` | Voice encoder (unchanged from base) | |
| | `conds.pt` | Conditioning embeddings (unchanged from base) | |
| | `tokenizer.json` | Tokenizer extended with 6 Indic scripts | |
|
|
| ## Training Data |
|
|
| All audio resampled to 16kHz. Text cleaned, normalized, and prefixed with language tags. |
| Datasets sourced from AI4Bharat and SPRINGLab public datasets (CC BY 4.0). |
|
|
| ## Language Tags |
|
|
| Prefix your text with the appropriate language tag for best results: |
|
|
| | Language | Tag | Script | |
| |----------|-----|--------| |
| | Bengali | `[bn]` | Bengali | |
| | Hindi | `[hi]` | Devanagari | |
| | Marathi | `[mr]` | Devanagari | |
| | Gujarati | `[gu]` | Gujarati | |
| | Tamil | `[ta]` | Tamil | |
| | Telugu | `[te]` | Telugu | |
|
|
| ## Limitations |
|
|
| - Optimized for 6 Indic languages; other languages may degrade |
| - Language tag prefix is required for correct language identification |
| - Best results with clear, well-punctuated text |
| - Emotion control inherited from base ChatterBox multilingual model |
| - Requires chatterbox-finetuning kit due to extended vocabulary |
|
|