Whisper Marathi Small – Fine-tuned ASR Model

This model is a fine-tuned version of Whisper Small for Marathi Automatic Speech Recognition (ASR).
It provides higher recognition accuracy for Marathi speech compared to the base Whisper-small model.
Optimized for conversations, YouTube speech, interviews, calls, and general-use Marathi audio.

Model Details

Model Description

whisper-Marathi-small-finetuned is trained on curated Marathi audio datasets to improve transcription quality while keeping the Whisper Small efficiency.

Developed by: Varun , Sumedh
Model type: Encoder–Decoder Transformer (Speech-to-Text)
Language: Marathi
License: MIT (same as Whisper)
Base Model: openai/whisper-small
Framework: transformers

Model Sources

Model Repository: https://huggingface.co/Prasad12344321/whisper-Marathi-small-finetuned
Demo (optional): Coming soon
Paper (Base Whisper): “Robust Speech Recognition via Large-Scale Weak Supervision”

Uses

Direct Use

This model can be used for:

General Marathi ASR
Subtitling Marathi videos and media
Transcribing conversations, calls, interviews
Speech recognition for chatbots / voice assistants
Marathi podcast or lecture transcription

Downstream Use

Fine-tuning on domain-specific audio (medical, education, customer support)
Building ASR-based AI tools in Marathi
Large-scale subtitle and caption generation

Out-of-Scope Use

Non-Marathi speech
Heavy background noise
Multi-speaker overlapping conversations
Legal/medical transcription without human verification

Bias, Risks, and Limitations

Whisper can hallucinate text with very noisy audio
Accuracy drops with thick accents or dialects not seen in training
Not suitable for extremely long single-pass audio without chunking
Not a translation model (use Whisper translation models instead)

Recommendations

Prefer 16 kHz WAV audio
Use chunking for long audio (>30 sec)
Avoid overlapping speakers
Always verify the output in critical applications

⭐ How to Use

Below is the recommended official usage code for this model.

🔥 Recommended Inference Code (supports long audio)

from transformers import pipeline, AutoProcessor

model_name = "Prasad12344321/whisper-Marathi-small-finetuned"

processor = AutoProcessor.from_pretrained(model_name)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model_name,
    chunk_length_s=30,   # long audio support
    stride_length_s=(4, 2)
)

print(pipe("/content/test100.mp3")["text"])

Downloads last month: -; Downloads are not tracked for this model. How to track