README: drop preview status, add 25 voices, fix GH URL typo (shreyask -> shreyaskarnik), link demo + blog

b0f735d verified 29 days ago

8.72 kB

	---
	license: apache-2.0
	language:
	- mr
	library_name: kokoro
	pipeline_tag: text-to-speech
	base_model: hexgrad/Kokoro-82M
	base_model_relation: finetune
	datasets:
	- ai4bharat/Rasa
	- ai4bharat/indicvoices_r
	- SPRINGLab/IndicTTS_Marathi
	tags:
	- text-to-speech
	- tts
	- kokoro
	- marathi
	- minglish
	- indic
	- styletts2
	- bol-tts
	---

	# bol-tts-marathi — Kokoro-82M fine-tuned for Marathi

	A Marathi (मराठी) text-to-speech fine-tune of [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M), trained with the [semidark/kokoro-deutsch](https://github.com/semidark/kokoro-deutsch) recipe. Handles pure Marathi and Minglish (Marathi + English code-switching) via a client-side Devanagari-transliteration preprocessor.

	- Architecture: StyleTTS2 acoustic model + ISTFTNet decoder (Kokoro-82M, unchanged)
	- Parameters: 81.76 M
	- Sample rate: 24 kHz
	- Voices: 25 (4 Marathi-trained + 19 stock-Kokoro crossovers + 2 synthetic) — see voice catalog below
	- Live demo: [shreyask/bol-tts-marathi](https://huggingface.co/spaces/shreyask/bol-tts-marathi) (in-browser via WebGPU)
	- Write-up: [kshreyas.dev/post/bol-tts-marathi](https://kshreyas.dev/post/bol-tts-marathi/) — full design + debugging story with audio samples
	- Code: [github.com/shreyaskarnik/bol-tts-marathi](https://github.com/shreyaskarnik/bol-tts-marathi)
	- ONNX export: [shreyask/bol-tts-marathi-onnx](https://huggingface.co/shreyask/bol-tts-marathi-onnx)

	## Voice catalog (25 voices)

	### Marathi-trained (4)

	\| ID \| Display \| Source \| Default speed \|
	\|---\|---\|---\|---\|
	\| `mf_asha` \| Asha (आशा) \| [Rasa](https://huggingface.co/datasets/ai4bharat/Rasa) `marathi_female` \| 1.00× \|
	\| `mm_vivek` \| Vivek (विवेक) \| Rasa `marathi_male` \| 1.00× \|
	\| `mf_mukta` \| Mukta (मुक्ता) \| [SPRINGLab](https://huggingface.co/datasets/SPRINGLab/IndicTTS_Marathi) female \| 0.80× \|
	\| `mm_dnyanesh` \| Dnyanesh (ज्ञानेश) \| SPRINGLab male \| 0.80× \|

	### Stock-Kokoro crossovers (19)

	Stock voicepacks from [hexgrad/kokoro.js](https://github.com/hexgrad/kokoro.js) used as `ref_s` on this fine-tune. Because v0.2 is a continuation fine-tune, the encoder latent space stays close enough to stock Kokoro's that stock voicepacks plug in directly. Pre-screened by `peak < 0.95` to filter ones that clip.

	\| ID \| Display \| Source language \|
	\|---\|---\|---\|
	\| `af_heart` \| Svara (स्वरा) \| US English F \|
	\| `af_alloy` \| Anvita (अन्विता) \| US English F \|
	\| `af_aoede` \| Sanika (सानिका) \| US English F \|
	\| `af_bella` \| Naina (नैना) \| US English F \|
	\| `af_jessica` \| Ishani (ईशानी) \| US English F \|
	\| `af_nova` \| Tara (तारा) \| US English F \|
	\| `af_sarah` \| Kavya (काव्या) \| US English F \|
	\| `af_sky` \| Akasha (आकाशा) \| US English F \|
	\| `am_liam` \| Atharv (अथर्व) \| US English M \|
	\| `bf_isabella` \| Ira (इरा) \| UK English F \|
	\| `bm_fable` \| Aaryan (आर्यन) \| UK English M \|
	\| `ff_siwis` \| Esha (ईशा) \| French F \|
	\| `hm_omega` \| Vihaan (विहान) \| Hindi M \|
	\| `im_nicola` \| Niraj (निरज) \| Italian M \|
	\| `pf_dora` \| Rhea (रिया) \| Portuguese F \|
	\| `zf_xiaoni` \| Nyra (नयरा) \| Mandarin F \|
	\| `zf_xiaoxiao` \| Pari (परी) \| Mandarin F (kid) \|
	\| `zf_xiaoyi` \| Vir (वीर) \| Mandarin F (perceived M kid) \|
	\| `zm_yunyang` \| Aakash (आकाश) \| Mandarin M \|

	### Synthetic — generated arithmetically with no reference audio (2)

	\| ID \| Display \| Recipe \|
	\|---\|---\|---\|
	\| `syn_sama` \| Sama (समा) \| Centroid (mean) of 5 modern English female voicepacks \|
	\| `syn_navya` \| Navya (नव्या) \| Centroid + per-position Gaussian noise (1σ) \|

	The voicepack tensor `[510, 1, 256]` is a plain embedding — it can be constructed by averaging existing voicepacks, sampling near the centroid, or interpolating. See [voicepack zoo](https://github.com/shreyaskarnik/bol-tts-marathi#synthetic-voicepacks) in the repo for recipes.

	## Usage

	```python
	import torch, soundfile as sf
	from kokoro import KModel, KPipeline
	import kokoro.pipeline as _kp

	_kp.LANG_CODES["m"] = "mr" # monkey-patch Marathi lang code

	kmodel = KModel(
	repo_id="shreyask/bol-tts-marathi",
	config="config.json",
	model="kokoro-mr-v0_2.pth",
	)
	kmodel.train(False)

	pipeline = KPipeline(lang_code="m", repo_id="shreyask/bol-tts-marathi", model=kmodel)
	voice = torch.load("voices/mf_asha.pt", map_location="cpu", weights_only=True)

	text = "नमस्कार, मी मराठी बोलतो."
	chunks = []
	for _gs, _ps, audio in pipeline(text, voice=voice, speed=1.0):
	chunks.append(audio)

	sf.write("out.wav", chunks[0].numpy() if len(chunks) == 1 else torch.cat(chunks).numpy(), 24000)
	```

	### Minglish (loanword) handling

	For Marathi mixed with English (`"Friday ला Zomato वर dinner order करूया का?"`), use the loanword preprocessor first to transliterate Latin tokens to Devanagari before phonemization:

	```python
	from preprocess_loanwords import preprocess
	text = preprocess("Friday ला Zomato वर dinner order करूया का?")
	# → "फ्रायडे ला झोमॅटो वर डिनर ऑर्डर करूया का?"
	# Then feed to the pipeline as usual.
	```

	Source + ~19,500-entry lookup table: [scripts/preprocess_loanwords.py](https://github.com/shreyaskarnik/bol-tts-marathi/blob/main/scripts/preprocess_loanwords.py) and [data/loanword_map.json](https://github.com/shreyaskarnik/bol-tts-marathi/blob/main/data/loanword_map.json).

	### Per-voice timestamps

	Kokoro predicts per-phoneme durations. `KModel.forward_with_tokens` returns `(audio, pred_dur)`. `pred_dur` is in predictor frames where 1 frame = 600 audio samples at 24 kHz (the prosody predictor runs at half the mel-frame rate; the decoder upsamples 2× before iSTFT):

	```python
	audio, pred_dur = kmodel.forward_with_tokens(input_ids, ref_s, speed=1.0)
	durations_sec = pred_dur.squeeze().cpu().numpy() * 600 / 24000
	starts = durations_sec.cumsum() - durations_sec
	# (starts[i], starts[i] + durations_sec[i]) is the time span of phoneme[i]
	```

	## Training

	\| Phase \| Details \|
	\|---\|---\|
	\| Base \| `hexgrad/Kokoro-82M` \|
	\| Stage 1 \| 10 epochs, bs=12, fp32, ~9h on A100 SXM 80GB. Final val_loss ≈ 0.23 \|
	\| Stage 2 \| 10 epochs, bs=8, `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`, ~13h \|
	\| Train utts \| 24,676 (95/5 split) \|
	\| Speakers \| 331 (2 Rasa + 329 IndicVoices-R) + SPRINGLab IndicTTS-Marathi (single F + single M) \|
	\| Vocab change \| `ɭ` (U+026D, retroflex lateral) at Kokoro slot 144 — Marathi-specific phoneme that Hindi doesn't have \|

	Full methodology: [TRAINING_GUIDE.md](https://github.com/shreyaskarnik/bol-tts-marathi/blob/main/docs/TRAINING_GUIDE.md).

	## Datasets

	- [AI4Bharat/Rasa](https://huggingface.co/datasets/ai4bharat/Rasa) (CC-BY-4.0) — Marathi, 13,900 studio-quality utts, 2 speakers.
	- [AI4Bharat/IndicVoices-R](https://huggingface.co/datasets/ai4bharat/indicvoices_r) (CC-BY-4.0, gated) — Marathi, ~11,910 utts, 329 speakers after filtering.
	- [SPRINGLab/IndicTTS-Marathi](https://huggingface.co/datasets/SPRINGLab/IndicTTS_Marathi) (IITM EULA, commercial-OK) — single female + single male speaker, used for Mukta + Dnyanesh.

	## Limitations

	- Pure-English-only sentences — the decoder hallucinates Marathi acoustics if you don't give it any Devanagari context. The Minglish trick handles mixed input via Devanagari transliteration; pure English needs a different fallback.
	- Long-tail loanwords — the 19,500-entry map covers high-frequency English words in Indian usage; rarer words fall through to espeak-mr unchanged.
	- Decoder English-leakage is accidental, not designed — v0.2's decoder happens to render `/ɟʰ/` (Devanagari झ) with an English-flavored `/z/` quality, which makes "amazing" → अमेझिंग → audible "amazing." The follow-up v0.5 retraining lost this property by being more correctly Marathi; v0.6 is planned to preserve the leakage deliberately.

	## License

	Apache 2.0. Training data under their respective licenses (Rasa CC-BY-4.0, IndicVoices-R CC-BY-4.0, SPRINGLab IITM EULA).

	## Citation

	```bibtex
	@software{bol_tts_marathi_2026,
	title={bol-tts-marathi: Kokoro-82M fine-tuned for Marathi},
	author={Karnik, Shreyas},
	year={2026},
	url={https://github.com/shreyaskarnik/bol-tts-marathi},
	license={Apache-2.0}
	}
	@software{kokoro_2025,
	title={Kokoro-82M},
	author={hexgrad},
	year={2025},
	url={https://github.com/hexgrad/kokoro}
	}
	@software{kokoro_deutsch_2026,
	title={kokoro-deutsch},
	author={semidark},
	year={2026},
	url={https://github.com/semidark/kokoro-deutsch}
	}
	```