sumit-maithili-tiny

Fine-tuned openai/whisper-tiny for Maithili (मैथिली) automatic speech recognition.

Model Details

Property	Value
Base Model	`openai/whisper-tiny` (39M params)
Language	Maithili (mai) - Devanagari script
Task	Automatic Speech Recognition (ASR)
Dataset	IISc SYSPIN Maithili TTS Dataset
Training Samples	2,332 (90% of 2,592 total)
Test Samples	260 (10% held out)
Best WER	63.90% (step 1500)

Dataset

The model was trained on the IISc SYSPIN Project Maithili TTS Dataset — studio-recorded speech data released under the SYSPIN project by the Indian Institute of Science (IISc), Bengaluru.

Speaker	Utterances	Duration	Age	Experience
Male (Spk001)	2,060	3h 27m 47s	43	11 Years
Female (Spk001)	532	0h 54m 9s	30	1 Year

Recording: Neumann TLM-103 microphone, professional studio, ~40dB SNR
Audio format: 48kHz, 24-bit, Mono WAV (resampled to 16kHz for training)
Domains: Agriculture, Books, Finance, Food, Health, India Related, Local Conversation, Politics, Social, Sports, Technology
Transcripts: Devanagari script (Maithili language)
License: Audio data is released under CC-BY-4.0 by IISc, Bengaluru

Training

Hyperparameter	Value
Batch size	4 (effective 16 with gradient accumulation)
Learning rate	1e-5
Warmup steps	200
Total steps	2,000
Precision	FP16
Gradient checkpointing	Enabled
Optimizer	AdamW (fused)

Training Metrics

Step	Eval Loss	Eval WER
500	0.573	68.78%
1000	0.537	64.68%
1500	0.557	63.90%
2000	0.573	64.20%

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline

model = WhisperForConditionalGeneration.from_pretrained("rockerritesh/whisper-tiny-maithili")
processor = WhisperProcessor.from_pretrained("rockerritesh/whisper-tiny-maithili")

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
)

result = pipe("path/to/maithili_audio.wav")
print(result["text"])

Demo

Try it out: Maithili ASR Space

Notes

Whisper does not natively support Maithili, so Hindi (hi) was used as the language token during training. Both languages use Devanagari script, making the tokenizer compatible.
This is a Whisper Tiny model (39M params). For better accuracy, consider fine-tuning whisper-small or whisper-medium.
The WER of ~64% reflects the challenge of low-resource Maithili ASR with limited training data. Performance can be improved with more data and larger models.

Citation

If you use this model, please cite:

@misc{sumit-maithili-tiny,
    title = {sumit-maithili-tiny: Whisper Tiny Fine-tuned for Maithili ASR},
    author = {Sumit Yadav},
    year = {2026},
    url = {https://huggingface.co/rockerritesh/whisper-tiny-maithili}
}

The audio data is from the SYSPIN project. Please also cite:

@misc{SYSPIN_S1.0_Corpus,
    title = {SYSPIN_S1.0 Corpus - A TTS Corpus of 900+ hours in nine Indian Languages},
    author = {Abhayjeet Et al.},
    year = {2025}
}

Acknowledgments

The audio dataset was created under the SYSPIN project by Indian Institute of Science (IISc), Bengaluru and is released under CC-BY-4.0. We are grateful to the voice artists and the SPIRE Lab, EE Dept., IISc for making this data publicly available.

Special thanks to the project of German Development Cooperation "FAIR Forward - AI for All" and Bhashini AI Solutions Private Limited for their financial support in developing the TTS corpus.

Contact (dataset): SPIRE Lab, EE Dept., IISc, Bengaluru — contact.syspin@iisc.ac.in

Downloads last month: 4

Safetensors

Model size

37.8M params

Tensor type

F32

Model tree for rockerritesh/whisper-tiny-maithili

Base model

openai/whisper-tiny

Finetuned

(1859)

this model

Space using rockerritesh/whisper-tiny-maithili 1

Evaluation results

Word Error Rate on IISc SYSPIN Maithili
self-reported

63.900