sumit-maithili-tiny

Fine-tuned openai/whisper-tiny for Maithili (मैथिली) automatic speech recognition.

Model Details

Property Value
Base Model openai/whisper-tiny (39M params)
Language Maithili (mai) - Devanagari script
Task Automatic Speech Recognition (ASR)
Dataset IISc SYSPIN Maithili TTS Dataset
Training Samples 2,332 (90% of 2,592 total)
Test Samples 260 (10% held out)
Best WER 63.90% (step 1500)

Dataset

The model was trained on the IISc SYSPIN Project Maithili TTS Dataset — studio-recorded speech data released under the SYSPIN project by the Indian Institute of Science (IISc), Bengaluru.

Speaker Utterances Duration Age Experience
Male (Spk001) 2,060 3h 27m 47s 43 11 Years
Female (Spk001) 532 0h 54m 9s 30 1 Year
  • Recording: Neumann TLM-103 microphone, professional studio, ~40dB SNR
  • Audio format: 48kHz, 24-bit, Mono WAV (resampled to 16kHz for training)
  • Domains: Agriculture, Books, Finance, Food, Health, India Related, Local Conversation, Politics, Social, Sports, Technology
  • Transcripts: Devanagari script (Maithili language)
  • License: Audio data is released under CC-BY-4.0 by IISc, Bengaluru

Training

Hyperparameter Value
Batch size 4 (effective 16 with gradient accumulation)
Learning rate 1e-5
Warmup steps 200
Total steps 2,000
Precision FP16
Gradient checkpointing Enabled
Optimizer AdamW (fused)

Training Metrics

Step Eval Loss Eval WER
500 0.573 68.78%
1000 0.537 64.68%
1500 0.557 63.90%
2000 0.573 64.20%

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline

model = WhisperForConditionalGeneration.from_pretrained("rockerritesh/whisper-tiny-maithili")
processor = WhisperProcessor.from_pretrained("rockerritesh/whisper-tiny-maithili")

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
)

result = pipe("path/to/maithili_audio.wav")
print(result["text"])

Demo

Try it out: Maithili ASR Space

Notes

  • Whisper does not natively support Maithili, so Hindi (hi) was used as the language token during training. Both languages use Devanagari script, making the tokenizer compatible.
  • This is a Whisper Tiny model (39M params). For better accuracy, consider fine-tuning whisper-small or whisper-medium.
  • The WER of ~64% reflects the challenge of low-resource Maithili ASR with limited training data. Performance can be improved with more data and larger models.

Citation

If you use this model, please cite:

@misc{sumit-maithili-tiny,
    title = {sumit-maithili-tiny: Whisper Tiny Fine-tuned for Maithili ASR},
    author = {Sumit Yadav},
    year = {2026},
    url = {https://huggingface.co/rockerritesh/whisper-tiny-maithili}
}

The audio data is from the SYSPIN project. Please also cite:

@misc{SYSPIN_S1.0_Corpus,
    title = {SYSPIN_S1.0 Corpus - A TTS Corpus of 900+ hours in nine Indian Languages},
    author = {Abhayjeet Et al.},
    year = {2025}
}

Acknowledgments

The audio dataset was created under the SYSPIN project by Indian Institute of Science (IISc), Bengaluru and is released under CC-BY-4.0. We are grateful to the voice artists and the SPIRE Lab, EE Dept., IISc for making this data publicly available.

Special thanks to the project of German Development Cooperation "FAIR Forward - AI for All" and Bhashini AI Solutions Private Limited for their financial support in developing the TTS corpus.

Contact (dataset): SPIRE Lab, EE Dept., IISc, Bengaluru — contact.syspin@iisc.ac.in

Downloads last month
20
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rockerritesh/whisper-tiny-maithili

Finetuned
(1719)
this model

Space using rockerritesh/whisper-tiny-maithili 1

Evaluation results