Whispering Voices: Fine-Tuning for Indic-to-English Translation

Model Name: Ablecredit-Whisper-Small

  • Model Overview:

    This is a fine-tuned version of OpenAI's Whisper model, trained to perform speech-to-text translation from Indic languages to English. The model has been optimized using open-source datasets and proprietary dataset created by AbleCredit.com to enhance transcription quality for Indic languages.

  • Languages Supported

    The model is trained to handle speech-to-text translation for the following Indic languages:

    • Hindi (hi)
    • Marathi (mr)
    • Gujarati (gu)
    • Tamil (ta)
    • Telugu (te)
    • Kannada (kn)

Fine-Tuning Process

The base Whisper model was fine-tuned using a carefully curated dataset consisting of open-source corpus and a well curated internal dataset from AbleCredit. The dataset underwent preprocessing, including text normalization, score based filtering and feature engineering.

  • Key Improvements After Fine-Tuning

    ๐Ÿ“‰ Word Error Rate (WER) Reduction: 54% lower WER compared to the base model.

    ๐Ÿ“ˆ BLEU Score Increase: 5.5ร— higher BLEU score, indicating improved translation accuracy.

Evaluation Metrics

  • Word Error Rate (WER) WER is used to measure the error rate in transcriptions. A lower WER means better transcription accuracy.

    WER Reduction: 54% improvement over the base Whisper model.

  • BLEU Score The BLEU (Bilingual Evaluation Understudy) Score is a metric used to evaluate machine translation quality. Higher BLEU scores indicate better translation quality.

    BLEU Score Improvement: 5.5ร— higher BLEU compared to the base model.

Use Cases Transcribing Indic-language speech into English text

Connect with me over LinkedIn

AbleCredit

Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AbleCredit/Ablecredit-Whisper-Small

Finetuned
(3269)
this model