Whisper Marathi Small – Fine-tuned ASR Model
This model is a fine-tuned version of Whisper Small for Marathi Automatic Speech Recognition (ASR).
It provides higher recognition accuracy for Marathi speech compared to the base Whisper-small model.
Optimized for conversations, YouTube speech, interviews, calls, and general-use Marathi audio.
Model Details
Model Description
whisper-Marathi-small-finetuned is trained on curated Marathi audio datasets to improve transcription quality while keeping the Whisper Small efficiency.
- Developed by: Varun , Sumedh
- Model type: Encoder–Decoder Transformer (Speech-to-Text)
- Language: Marathi
- License: MIT (same as Whisper)
- Base Model: openai/whisper-small
- Framework: transformers
Model Sources
- Model Repository: https://huggingface.co/Prasad12344321/whisper-Marathi-small-finetuned
- Demo (optional): Coming soon
- Paper (Base Whisper): “Robust Speech Recognition via Large-Scale Weak Supervision”
Uses
Direct Use
This model can be used for:
- General Marathi ASR
- Subtitling Marathi videos and media
- Transcribing conversations, calls, interviews
- Speech recognition for chatbots / voice assistants
- Marathi podcast or lecture transcription
Downstream Use
- Fine-tuning on domain-specific audio (medical, education, customer support)
- Building ASR-based AI tools in Marathi
- Large-scale subtitle and caption generation
Out-of-Scope Use
- Non-Marathi speech
- Heavy background noise
- Multi-speaker overlapping conversations
- Legal/medical transcription without human verification
Bias, Risks, and Limitations
- Whisper can hallucinate text with very noisy audio
- Accuracy drops with thick accents or dialects not seen in training
- Not suitable for extremely long single-pass audio without chunking
- Not a translation model (use Whisper translation models instead)
Recommendations
- Prefer 16 kHz WAV audio
- Use chunking for long audio (>30 sec)
- Avoid overlapping speakers
- Always verify the output in critical applications
⭐ How to Use
Below is the recommended official usage code for this model.
🔥 Recommended Inference Code (supports long audio)
from transformers import pipeline, AutoProcessor
model_name = "Prasad12344321/whisper-Marathi-small-finetuned"
processor = AutoProcessor.from_pretrained(model_name)
pipe = pipeline(
"automatic-speech-recognition",
model=model_name,
chunk_length_s=30, # long audio support
stride_length_s=(4, 2)
)
print(pipe("/content/test100.mp3")["text"])