bol-tts-marathi / README.md
shreyask's picture
README: drop preview status, add 25 voices, fix GH URL typo (shreyask -> shreyaskarnik), link demo + blog
b0f735d verified
---
license: apache-2.0
language:
- mr
library_name: kokoro
pipeline_tag: text-to-speech
base_model: hexgrad/Kokoro-82M
base_model_relation: finetune
datasets:
- ai4bharat/Rasa
- ai4bharat/indicvoices_r
- SPRINGLab/IndicTTS_Marathi
tags:
- text-to-speech
- tts
- kokoro
- marathi
- minglish
- indic
- styletts2
- bol-tts
---
# bol-tts-marathi — Kokoro-82M fine-tuned for Marathi
A Marathi (मराठी) text-to-speech fine-tune of [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M), trained with the [semidark/kokoro-deutsch](https://github.com/semidark/kokoro-deutsch) recipe. Handles pure Marathi and **Minglish** (Marathi + English code-switching) via a client-side Devanagari-transliteration preprocessor.
- **Architecture:** StyleTTS2 acoustic model + ISTFTNet decoder (Kokoro-82M, unchanged)
- **Parameters:** 81.76 M
- **Sample rate:** 24 kHz
- **Voices:** 25 (4 Marathi-trained + 19 stock-Kokoro crossovers + 2 synthetic) — see voice catalog below
- **Live demo:** [shreyask/bol-tts-marathi](https://huggingface.co/spaces/shreyask/bol-tts-marathi) (in-browser via WebGPU)
- **Write-up:** [kshreyas.dev/post/bol-tts-marathi](https://kshreyas.dev/post/bol-tts-marathi/) — full design + debugging story with audio samples
- **Code:** [github.com/shreyaskarnik/bol-tts-marathi](https://github.com/shreyaskarnik/bol-tts-marathi)
- **ONNX export:** [shreyask/bol-tts-marathi-onnx](https://huggingface.co/shreyask/bol-tts-marathi-onnx)
## Voice catalog (25 voices)
### Marathi-trained (4)
| ID | Display | Source | Default speed |
|---|---|---|---|
| `mf_asha` | Asha (आशा) | [Rasa](https://huggingface.co/datasets/ai4bharat/Rasa) `marathi_female` | 1.00× |
| `mm_vivek` | Vivek (विवेक) | Rasa `marathi_male` | 1.00× |
| `mf_mukta` | Mukta (मुक्ता) | [SPRINGLab](https://huggingface.co/datasets/SPRINGLab/IndicTTS_Marathi) female | 0.80× |
| `mm_dnyanesh` | Dnyanesh (ज्ञानेश) | SPRINGLab male | 0.80× |
### Stock-Kokoro crossovers (19)
Stock voicepacks from [hexgrad/kokoro.js](https://github.com/hexgrad/kokoro.js) used as `ref_s` on this fine-tune. Because v0.2 is a continuation fine-tune, the encoder latent space stays close enough to stock Kokoro's that stock voicepacks plug in directly. Pre-screened by `peak < 0.95` to filter ones that clip.
| ID | Display | Source language |
|---|---|---|
| `af_heart` | Svara (स्वरा) | US English F |
| `af_alloy` | Anvita (अन्विता) | US English F |
| `af_aoede` | Sanika (सानिका) | US English F |
| `af_bella` | Naina (नैना) | US English F |
| `af_jessica` | Ishani (ईशानी) | US English F |
| `af_nova` | Tara (तारा) | US English F |
| `af_sarah` | Kavya (काव्या) | US English F |
| `af_sky` | Akasha (आकाशा) | US English F |
| `am_liam` | Atharv (अथर्व) | US English M |
| `bf_isabella` | Ira (इरा) | UK English F |
| `bm_fable` | Aaryan (आर्यन) | UK English M |
| `ff_siwis` | Esha (ईशा) | French F |
| `hm_omega` | Vihaan (विहान) | Hindi M |
| `im_nicola` | Niraj (निरज) | Italian M |
| `pf_dora` | Rhea (रिया) | Portuguese F |
| `zf_xiaoni` | Nyra (नयरा) | Mandarin F |
| `zf_xiaoxiao` | Pari (परी) | Mandarin F (kid) |
| `zf_xiaoyi` | Vir (वीर) | Mandarin F (perceived M kid) |
| `zm_yunyang` | Aakash (आकाश) | Mandarin M |
### Synthetic — generated arithmetically with no reference audio (2)
| ID | Display | Recipe |
|---|---|---|
| `syn_sama` | Sama (समा) | Centroid (mean) of 5 modern English female voicepacks |
| `syn_navya` | Navya (नव्या) | Centroid + per-position Gaussian noise (1σ) |
The voicepack tensor `[510, 1, 256]` is a plain embedding — it can be constructed by averaging existing voicepacks, sampling near the centroid, or interpolating. See [voicepack zoo](https://github.com/shreyaskarnik/bol-tts-marathi#synthetic-voicepacks) in the repo for recipes.
## Usage
```python
import torch, soundfile as sf
from kokoro import KModel, KPipeline
import kokoro.pipeline as _kp
_kp.LANG_CODES["m"] = "mr" # monkey-patch Marathi lang code
kmodel = KModel(
repo_id="shreyask/bol-tts-marathi",
config="config.json",
model="kokoro-mr-v0_2.pth",
)
kmodel.train(False)
pipeline = KPipeline(lang_code="m", repo_id="shreyask/bol-tts-marathi", model=kmodel)
voice = torch.load("voices/mf_asha.pt", map_location="cpu", weights_only=True)
text = "नमस्कार, मी मराठी बोलतो."
chunks = []
for _gs, _ps, audio in pipeline(text, voice=voice, speed=1.0):
chunks.append(audio)
sf.write("out.wav", chunks[0].numpy() if len(chunks) == 1 else torch.cat(chunks).numpy(), 24000)
```
### Minglish (loanword) handling
For Marathi mixed with English (`"Friday ला Zomato वर dinner order करूया का?"`), use the loanword preprocessor first to transliterate Latin tokens to Devanagari before phonemization:
```python
from preprocess_loanwords import preprocess
text = preprocess("Friday ला Zomato वर dinner order करूया का?")
# → "फ्रायडे ला झोमॅटो वर डिनर ऑर्डर करूया का?"
# Then feed to the pipeline as usual.
```
Source + ~19,500-entry lookup table: [scripts/preprocess_loanwords.py](https://github.com/shreyaskarnik/bol-tts-marathi/blob/main/scripts/preprocess_loanwords.py) and [data/loanword_map.json](https://github.com/shreyaskarnik/bol-tts-marathi/blob/main/data/loanword_map.json).
### Per-voice timestamps
Kokoro predicts per-phoneme durations. `KModel.forward_with_tokens` returns `(audio, pred_dur)`. `pred_dur` is in **predictor frames** where 1 frame = 600 audio samples at 24 kHz (the prosody predictor runs at half the mel-frame rate; the decoder upsamples 2× before iSTFT):
```python
audio, pred_dur = kmodel.forward_with_tokens(input_ids, ref_s, speed=1.0)
durations_sec = pred_dur.squeeze().cpu().numpy() * 600 / 24000
starts = durations_sec.cumsum() - durations_sec
# (starts[i], starts[i] + durations_sec[i]) is the time span of phoneme[i]
```
## Training
| Phase | Details |
|---|---|
| Base | `hexgrad/Kokoro-82M` |
| Stage 1 | 10 epochs, bs=12, fp32, ~9h on A100 SXM 80GB. Final val_loss ≈ 0.23 |
| Stage 2 | 10 epochs, bs=8, `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`, ~13h |
| Train utts | 24,676 (95/5 split) |
| Speakers | 331 (2 Rasa + 329 IndicVoices-R) + SPRINGLab IndicTTS-Marathi (single F + single M) |
| Vocab change | `ɭ` (U+026D, retroflex lateral) at Kokoro slot 144 — Marathi-specific phoneme that Hindi doesn't have |
Full methodology: [TRAINING_GUIDE.md](https://github.com/shreyaskarnik/bol-tts-marathi/blob/main/docs/TRAINING_GUIDE.md).
## Datasets
- **[AI4Bharat/Rasa](https://huggingface.co/datasets/ai4bharat/Rasa)** (CC-BY-4.0) — Marathi, 13,900 studio-quality utts, 2 speakers.
- **[AI4Bharat/IndicVoices-R](https://huggingface.co/datasets/ai4bharat/indicvoices_r)** (CC-BY-4.0, gated) — Marathi, ~11,910 utts, 329 speakers after filtering.
- **[SPRINGLab/IndicTTS-Marathi](https://huggingface.co/datasets/SPRINGLab/IndicTTS_Marathi)** (IITM EULA, commercial-OK) — single female + single male speaker, used for Mukta + Dnyanesh.
## Limitations
- **Pure-English-only sentences** — the decoder hallucinates Marathi acoustics if you don't give it any Devanagari context. The Minglish trick handles mixed input via Devanagari transliteration; pure English needs a different fallback.
- **Long-tail loanwords** — the 19,500-entry map covers high-frequency English words in Indian usage; rarer words fall through to espeak-mr unchanged.
- **Decoder English-leakage is accidental, not designed** — v0.2's decoder happens to render `/ɟʰ/` (Devanagari झ) with an English-flavored `/z/` quality, which makes "amazing" → अमेझिंग → audible "amazing." The follow-up v0.5 retraining lost this property by being more correctly Marathi; v0.6 is planned to preserve the leakage deliberately.
## License
Apache 2.0. Training data under their respective licenses (Rasa CC-BY-4.0, IndicVoices-R CC-BY-4.0, SPRINGLab IITM EULA).
## Citation
```bibtex
@software{bol_tts_marathi_2026,
title={bol-tts-marathi: Kokoro-82M fine-tuned for Marathi},
author={Karnik, Shreyas},
year={2026},
url={https://github.com/shreyaskarnik/bol-tts-marathi},
license={Apache-2.0}
}
@software{kokoro_2025,
title={Kokoro-82M},
author={hexgrad},
year={2025},
url={https://github.com/hexgrad/kokoro}
}
@software{kokoro_deutsch_2026,
title={kokoro-deutsch},
author={semidark},
year={2026},
url={https://github.com/semidark/kokoro-deutsch}
}
```