chatterbox-desi / README.md
arijitx's picture
Upload README.md with huggingface_hub
c3556b0 verified
---
license: mit
language:
- bn
- hi
- mr
- gu
- ta
- te
base_model: ResembleAI/chatterbox
tags:
- text-to-speech
- tts
- bengali
- hindi
- marathi
- gujarati
- tamil
- telugu
- indic
- chatterbox
- fine-tuned
- zero-shot-tts
- speech
- speech-synthesis
datasets:
- ai4bharat/Shrutilipi
- ai4bharat/Rasa
- SPRINGLab/IndicTTS_Bengali
- SPRINGLab/IndicTTS_Gujarati
- SPRINGLab/IndicTTS_Marathi
- SPRINGLab/IndicTTS_Tamil
- SPRINGLab/IndicTTS_Telugu
---
# ChatterBox Desi
A fine-tuned version of [ResembleAI/chatterbox](https://huggingface.co/ResembleAI/chatterbox) for **6 Indic languages** text-to-speech synthesis: Bengali, Hindi, Marathi, Gujarati, Tamil, and Telugu.
## Zero-shot TTS Output
| Language | Reference | Output | Text |
|--|--|--|--|
| Bengali (বাংলা) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_bn.wav" style="width: 100px;"></audio>|আমরা কেউ মাষ্টার হতে চেয়েছিলাম, কেউ ডাক্তার, কেউ উকিল। অমলকান্তি সে-সব কিছু হতে চায়নি। সে রোদ্দুর হতে চেয়েছিল!|
| Bengali (বাংলা) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_bn.wav" style="width: 100px;"></audio>|আমরা কেউ মাষ্টার হতে চেয়েছিলাম, কেউ ডাক্তার, কেউ উকিল। অমলকান্তি সে-সব কিছু হতে চায়নি। সে রোদ্দুর হতে চেয়েছিল!|
| Hindi (हिंदी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_hi.wav" style="width: 100px;"></audio>|हम में से कुछ मास्टर बनना चाहते थे, कुछ डॉक्टर, कुछ वकील। अमलकांति उन सब कुछ बनना नहीं चाहता था। वह धूप बनना चाहता था!|
| Hindi (हिंदी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_hi.wav" style="width: 100px;"></audio>|हम में से कुछ मास्टर बनना चाहते थे, कुछ डॉक्टर, कुछ वकील। अमलकांति उन सब कुछ बनना नहीं चाहता था। वह धूप बनना चाहता था!|
| Marathi (मराठी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_mr.wav" style="width: 100px;"></audio>|आम्ही कोणीतरी मास्टर होऊ इच्छित होतो, कोणीतरी डॉक्टर, कोणीतरी वकील. अमलकांती त्या सगळ्या काही होऊ इच्छित नव्हता. तो सूर्य होऊ इच्छित होता!|
| Marathi (मराठी) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_mr.wav" style="width: 100px;"></audio>|आम्ही कोणीतरी मास्टर होऊ इच्छित होतो, कोणीतरी डॉक्टर, कोणीतरी वकील. अमलकांती त्या सगळ्या काही होऊ इच्छित नव्हता. तो सूर्य होऊ इच्छित होता!|
| Gujarati (ગુજરાતી) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_gu.wav" style="width: 100px;"></audio>|અમલકાંતિ તે બધું બનવું નથી માંગતો હતો. તે ધૂપ બનવું માંગતો હતો!|
| Gujarati (ગુજરાતી) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_gu.wav" style="width: 100px;"></audio>|અમલકાંતિ તે બધું બનવું નથી માંગતો હતો. તે ધૂપ બનવું માંગતો હતો!|
| Tamil (தமிழ்) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_ta.wav" style="width: 100px;"></audio>|நாங்கள் யாரும் மாஸ்டர் ஆக விரும்பவில்லை, யாரும் டாக்டர் ஆக விரும்பவில்லை, யாரும் வக்கீல் ஆக விரும்பவில்லை. அமல்காந்தி அந்த எல்லாவற்றையும் ஆக விரும்பவில்லை. அவன் வெயிலாக இருக்க விரும்பினான்!|
| Tamil (தமிழ்) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_ta.wav" style="width: 100px;"></audio>|நாங்கள் யாரும் மாஸ்டர் ஆக விரும்பவில்லை, யாரும் டாக்டர் ஆக விரும்பவில்லை, யாரும் வக்கீல் ஆக விரும்பவில்லை. அமல்காந்தி அந்த எல்லாவற்றையும் ஆக விரும்பவில்லை. அவன் வெயிலாக இருக்க விரும்பினான்!|
| Telugu (తెలుగు) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/female_shadowheart.flac" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/female_te.wav" style="width: 100px;"></audio>|మనలో కొందరు మాస్టర్ కావాలని కోరుకున్నారు, కొందరు డాక్టర్ కావాలని కోరుకున్నారు, కొందరు వకీల్ కావాలని కోరుకున్నారు. అమల్కాంతి ఆ అన్ని కావాలని కోరుకోలేదు. అతను సూర్యుడిగా ఉండాలని కోరుకున్నాడు!|
| Telugu (తెలుగు) |<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/refs/male_stewie.wav" style="width: 100px;"></audio>|<audio controls src="https://huggingface.co/BosonLab/chatterbox-desi/resolve/main/audios/output/male_te.wav" style="width: 100px;"></audio>|మనలో కొందరు మాస్టర్ కావాలని కోరుకున్నారు, కొందరు డాక్టర్ కావాలని కోరుకున్నారు, కొందరు వకీల్ కావాలని కోరుకున్నారు. అమల్కాంతి ఆ అన్ని కావాలని కోరుకోలేదు. అతను సూర్యుడిగా ఉండాలని కోరుకున్నాడు!|
## Model Details
- **Base model**: ResembleAI/chatterbox — multilingual ChatterBox (supports 23 languages)
- **Fine-tuned on**: 6 Indic language speech corpus (~424 hours, 216,819 samples)
- ai4bharat/Shrutilipi (Bengali, Hindi splits)
- ai4bharat/Rasa (Bengali, Hindi, Marathi, Gujarati, Tamil, Telugu splits)
- SPRINGLab/IndicTTS (Bengali, Gujarati, Marathi, Tamil, Telugu)
- **Training steps**: 10,000
- **Architecture**: T3 (Text-to-Token Transformer) + HiFT-GAN vocoder
- **Vocabulary**: Extended from 2,530 → 2,820 tokens to cover all 6 Indic scripts
- **Language tagging**: Text must be prefixed with language tag (e.g. `[bn]`, `[hi]`, `[mr]`, `[gu]`, `[ta]`, `[te]`)
### Training Data by Language
| Language | Code | Samples | Hours |
|----------|------|---------|-------|
| Bengali | bn | 58,820 | 99.95 |
| Gujarati | gu | 32,604 | 73.17 |
| Hindi | hi | 12,116 | 21.55 |
| Marathi | mr | 37,899 | 72.70 |
| Tamil | ta | 39,437 | 72.74 |
| Telugu | te | 35,943 | 84.04 |
| **Total**| | **216,819** | **424.15** |
## Requirements
```bash
git clone https://github.com/gokhaneraslan/chatterbox-finetuning
cd chatterbox-finetuning
pip install -r requirements.txt
```
### One-time patch (upstream vocab resize fix)
The upstream `chatterbox-finetuning` repo initialises T3 with a hard-coded 704-token vocabulary, which causes a size mismatch when loading this model (vocab=2820). Apply this one-line fix before running inference:
```bash
# Run from inside the cloned chatterbox-finetuning directory
python - <<'EOF'
import re, pathlib
f = pathlib.Path("src/chatterbox_/tts.py")
txt = f.read_text()
old = " t3 = T3()\n t3_state = load_file(ckpt_dir / \"t3_cfg.safetensors\")"
new = (
" t3_state = load_file(ckpt_dir / \"t3_cfg.safetensors\")\n"
" from .models.t3.modules.t3_config import T3Config\n"
" t3 = T3(hp=T3Config(text_tokens_dict_size=t3_state[\"text_emb.weight\"].shape[0]))"
)
f.write_text(txt.replace(old, new))
print("Patched tts.py")
EOF
```
## Usage
**Important**: Text must be prefixed with a language tag: `[bn]`, `[hi]`, `[mr]`, `[gu]`, `[ta]`, or `[te]`.
```python
import sys
sys.path.insert(0, "/path/to/chatterbox-finetuning")
from huggingface_hub import snapshot_download
from src.chatterbox_.tts import ChatterboxTTS
import torchaudio
model_dir = snapshot_download("BosonLab/chatterbox-desi")
model = ChatterboxTTS.from_local(model_dir, device="cuda")
# Bengali
text = "[bn] আমি বাংলায় কথা বলতে পারি। এটি একটি পরীক্ষামূলক বাক্য।"
wav = model.generate(text)
torchaudio.save("output_bn.wav", wav, model.sr)
# Hindi
text = "[hi] मैं हिंदी में बोल सकता हूँ। यह एक परीक्षण वाक्य है।"
wav = model.generate(text)
torchaudio.save("output_hi.wav", wav, model.sr)
# Marathi
text = "[mr] मी मराठीत बोलू शकतो. हे एक चाचणी वाक्य आहे."
wav = model.generate(text)
torchaudio.save("output_mr.wav", wav, model.sr)
# Gujarati
text = "[gu] હું ગુજરાતીમાં બોલી શકું છું. આ એક પ્રાયોગિક વાક્ય છે."
wav = model.generate(text)
torchaudio.save("output_gu.wav", wav, model.sr)
# Tamil
text = "[ta] நான் தமிழில் பேச முடியும். இது ஒரு சோதனை வாக்கியம்."
wav = model.generate(text)
torchaudio.save("output_ta.wav", wav, model.sr)
# Telugu
text = "[te] నేను తెలుగులో మాట్లాడగలను. ఇది ఒక పరీక్ష వాక్యం."
wav = model.generate(text)
torchaudio.save("output_te.wav", wav, model.sr)
```
## With Voice Cloning
```python
wav = model.generate(text, audio_prompt_path="reference.wav")
```
## Files
| File | Description |
|------|-------------|
| `t3_cfg.safetensors` | Fine-tuned T3 text-to-token transformer (6 Indic langs, vocab=2820) |
| `s3gen.safetensors` | Speech codec decoder (unchanged from base) |
| `ve.safetensors` | Voice encoder (unchanged from base) |
| `conds.pt` | Conditioning embeddings (unchanged from base) |
| `tokenizer.json` | Tokenizer extended with 6 Indic scripts |
## Training Data
All audio resampled to 16kHz. Text cleaned, normalized, and prefixed with language tags.
Datasets sourced from AI4Bharat and SPRINGLab public datasets (CC BY 4.0).
## Language Tags
Prefix your text with the appropriate language tag for best results:
| Language | Tag | Script |
|----------|-----|--------|
| Bengali | `[bn]` | Bengali |
| Hindi | `[hi]` | Devanagari |
| Marathi | `[mr]` | Devanagari |
| Gujarati | `[gu]` | Gujarati |
| Tamil | `[ta]` | Tamil |
| Telugu | `[te]` | Telugu |
## Limitations
- Optimized for 6 Indic languages; other languages may degrade
- Language tag prefix is required for correct language identification
- Best results with clear, well-punctuated text
- Emotion control inherited from base ChatterBox multilingual model
- Requires chatterbox-finetuning kit due to extended vocabulary