Whisper Tiny Telugu Fine-tuning

Overview

This project demonstrates the fine-tuning of OpenAI's Whisper Tiny model for Telugu speech recognition. The base Whisper Tiny model was fine-tuned on the AI4Bharat Kathbath Telugu dataset to improve performance specifically for Telugu language transcription tasks.

Model Details

  • Base Model: OpenAI Whisper Tiny
  • Fine-tuned Model: Whisper Tiny fine-tuned on Telugu
  • Training Dataset: AI4Bharat Kathbath Telugu Dataset
  • Language: Telugu (te)

Training Dataset

The model was fine-tuned using the AI4Bharat Kathbath Telugu Dataset, which is a comprehensive collection of Telugu speech data designed for speech recognition tasks. This dataset contains diverse Telugu speech samples covering various domains and speakers, making it ideal for fine-tuning speech recognition models.

Performance Comparison

Overall Metrics

Metric Base Model Fine-tuned Improvement
WER (%) 402.96 97.83 ↓ 75.7%
CER (%) 171.09 77.13 ↓ 54.9%
Avg Time (ms) 367.3 711.0 ↓ -93.6%

Detailed Word Error Analysis

Error Type Base Model Fine-tuned Reduction
Substitutions 362 268 ↓ 94
Deletions 145 212 ↓ -67
Insertions 1536 16 ↓ 1520
Hits (correct) 0 27 ↑ 27

Key Findings

  • Significant WER Improvement: 75.7% reduction in Word Error Rate
  • CER Reduction: 54.9% improvement in Character Error Rate
  • Insertion Reduction: Dramatic 1520 reduction in insertion errors
  • Processing Time: Increased inference time due to fine-tuning complexity

Usage

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

# Load the fine-tuned model
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-tiny-telugu")
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-tiny-telugu")

# For inference
audio_input = # Your audio tensor
input_features = processor(audio_input, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

Citation

If you use this fine-tuned model or the methodology, please cite:

AI4Bharat Kathbath Dataset

@misc{ai4bharat_kathbath,
  title={Kathbath: A Large-Scale Speech Corpus for Indian Languages},
  author={AI4Bharat Team},
  year={2022},
  url={https://ai4bharat.iitm.ac.in/kathbath}
}

Whisper Original Paper

@article{radford2022robust,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
  journal={arXiv preprint arXiv:2212.04356},
  year={2022}
}

Fine-tuned Model

@misc{vanshnawander2026whisper_tiny_telugu,
  title={Whisper Tiny Telugu: Fine-tuned for Telugu Speech Recognition},
  author={Vansh Nawander},
  year={2026},
  url={https://huggingface.co/vanshnawander/whisper-tiny-telugu}
}

Acknowledgements

Whisper Base Model Acknowledgement

This work builds upon OpenAI's Whisper model. We acknowledge the OpenAI team for creating the Whisper architecture and pre-trained models:

"Whisper is a robust speech recognition model. It approaches human level robustness and accuracy across a wide range of conditions. The model is trained on 680,000 hours of multilingual and multitask supervised data collected from the web."

AI4Bharat Acknowledgement

We thank AI4Bharat for creating and maintaining the Kathbath Telugu dataset, which was crucial for this fine-tuning work. AI4Bharat's mission to advance Indian language AI research has made significant contributions to the NLP community.

Additional Thanks

  • The open-source community for tools and libraries that made this research possible
  • Contributors to the Hugging Face Transformers ecosystem
  • The Telugu-speaking community for providing valuable speech data

License

This fine-tuned model inherits the license terms from the original Whisper model and the AI4Bharat Kathbath dataset. Please refer to the original sources for specific licensing information.

Downloads last month
45
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vanshnawander/whisper-tiny-telugu

Finetuned
(1839)
this model

Paper for vanshnawander/whisper-tiny-telugu