sumit-maithili-tiny
Fine-tuned openai/whisper-tiny for Maithili (मैथिली) automatic speech recognition.
Model Details
| Property | Value |
|---|---|
| Base Model | openai/whisper-tiny (39M params) |
| Language | Maithili (mai) - Devanagari script |
| Task | Automatic Speech Recognition (ASR) |
| Dataset | IISc SYSPIN Maithili TTS Dataset |
| Training Samples | 2,332 (90% of 2,592 total) |
| Test Samples | 260 (10% held out) |
| Best WER | 63.90% (step 1500) |
Dataset
The model was trained on the IISc SYSPIN Project Maithili TTS Dataset — studio-recorded speech data released under the SYSPIN project by the Indian Institute of Science (IISc), Bengaluru.
| Speaker | Utterances | Duration | Age | Experience |
|---|---|---|---|---|
| Male (Spk001) | 2,060 | 3h 27m 47s | 43 | 11 Years |
| Female (Spk001) | 532 | 0h 54m 9s | 30 | 1 Year |
- Recording: Neumann TLM-103 microphone, professional studio, ~40dB SNR
- Audio format: 48kHz, 24-bit, Mono WAV (resampled to 16kHz for training)
- Domains: Agriculture, Books, Finance, Food, Health, India Related, Local Conversation, Politics, Social, Sports, Technology
- Transcripts: Devanagari script (Maithili language)
- License: Audio data is released under CC-BY-4.0 by IISc, Bengaluru
Training
| Hyperparameter | Value |
|---|---|
| Batch size | 4 (effective 16 with gradient accumulation) |
| Learning rate | 1e-5 |
| Warmup steps | 200 |
| Total steps | 2,000 |
| Precision | FP16 |
| Gradient checkpointing | Enabled |
| Optimizer | AdamW (fused) |
Training Metrics
| Step | Eval Loss | Eval WER |
|---|---|---|
| 500 | 0.573 | 68.78% |
| 1000 | 0.537 | 64.68% |
| 1500 | 0.557 | 63.90% |
| 2000 | 0.573 | 64.20% |
Usage
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
model = WhisperForConditionalGeneration.from_pretrained("rockerritesh/whisper-tiny-maithili")
processor = WhisperProcessor.from_pretrained("rockerritesh/whisper-tiny-maithili")
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
)
result = pipe("path/to/maithili_audio.wav")
print(result["text"])
Demo
Try it out: Maithili ASR Space
Notes
- Whisper does not natively support Maithili, so Hindi (
hi) was used as the language token during training. Both languages use Devanagari script, making the tokenizer compatible. - This is a Whisper Tiny model (39M params). For better accuracy, consider fine-tuning
whisper-smallorwhisper-medium. - The WER of ~64% reflects the challenge of low-resource Maithili ASR with limited training data. Performance can be improved with more data and larger models.
Citation
If you use this model, please cite:
@misc{sumit-maithili-tiny,
title = {sumit-maithili-tiny: Whisper Tiny Fine-tuned for Maithili ASR},
author = {Sumit Yadav},
year = {2026},
url = {https://huggingface.co/rockerritesh/whisper-tiny-maithili}
}
The audio data is from the SYSPIN project. Please also cite:
@misc{SYSPIN_S1.0_Corpus,
title = {SYSPIN_S1.0 Corpus - A TTS Corpus of 900+ hours in nine Indian Languages},
author = {Abhayjeet Et al.},
year = {2025}
}
Acknowledgments
The audio dataset was created under the SYSPIN project by Indian Institute of Science (IISc), Bengaluru and is released under CC-BY-4.0. We are grateful to the voice artists and the SPIRE Lab, EE Dept., IISc for making this data publicly available.
Special thanks to the project of German Development Cooperation "FAIR Forward - AI for All" and Bhashini AI Solutions Private Limited for their financial support in developing the TTS corpus.
Contact (dataset): SPIRE Lab, EE Dept., IISc, Bengaluru — contact.syspin@iisc.ac.in
- Downloads last month
- 20
Model tree for rockerritesh/whisper-tiny-maithili
Base model
openai/whisper-tinySpace using rockerritesh/whisper-tiny-maithili 1
Evaluation results
- Word Error Rate on IISc SYSPIN Maithiliself-reported63.900