Whisper Tiny Telugu Fine-tuning

Overview

This project demonstrates the fine-tuning of OpenAI's Whisper Tiny model for Telugu speech recognition. The base Whisper Tiny model was fine-tuned on the AI4Bharat Kathbath Telugu dataset to improve performance specifically for Telugu language transcription tasks.

Model Details

Base Model: OpenAI Whisper Tiny
Fine-tuned Model: Whisper Tiny fine-tuned on Telugu
Training Dataset: AI4Bharat Kathbath Telugu Dataset
Language: Telugu (te)

Training Dataset

The model was fine-tuned using the AI4Bharat Kathbath Telugu Dataset, which is a comprehensive collection of Telugu speech data designed for speech recognition tasks. This dataset contains diverse Telugu speech samples covering various domains and speakers, making it ideal for fine-tuning speech recognition models.

Performance Comparison

Overall Metrics

Metric	Base Model	Fine-tuned	Improvement
WER (%)	402.96	97.83	↓ 75.7%
CER (%)	171.09	77.13	↓ 54.9%
Avg Time (ms)	367.3	711.0	↓ -93.6%

Detailed Word Error Analysis

Error Type	Base Model	Fine-tuned	Reduction
Substitutions	362	268	↓ 94
Deletions	145	212	↓ -67
Insertions	1536	16	↓ 1520
Hits (correct)	0	27	↑ 27

Key Findings

Significant WER Improvement: 75.7% reduction in Word Error Rate
CER Reduction: 54.9% improvement in Character Error Rate
Insertion Reduction: Dramatic 1520 reduction in insertion errors
Processing Time: Increased inference time due to fine-tuning complexity

Usage

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

# Load the fine-tuned model
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-tiny-telugu")
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-tiny-telugu")

# For inference
audio_input = # Your audio tensor
input_features = processor(audio_input, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

Citation

If you use this fine-tuned model or the methodology, please cite:

AI4Bharat Kathbath Dataset

@misc{ai4bharat_kathbath,
  title={Kathbath: A Large-Scale Speech Corpus for Indian Languages},
  author={AI4Bharat Team},
  year={2022},
  url={https://ai4bharat.iitm.ac.in/kathbath}
}

Whisper Original Paper

@article{radford2022robust,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
  journal={arXiv preprint arXiv:2212.04356},
  year={2022}
}

Fine-tuned Model

@misc{vanshnawander2026whisper_tiny_telugu,
  title={Whisper Tiny Telugu: Fine-tuned for Telugu Speech Recognition},
  author={Vansh Nawander},
  year={2026},
  url={https://huggingface.co/vanshnawander/whisper-tiny-telugu}
}

Acknowledgements

Whisper Base Model Acknowledgement

This work builds upon OpenAI's Whisper model. We acknowledge the OpenAI team for creating the Whisper architecture and pre-trained models:

"Whisper is a robust speech recognition model. It approaches human level robustness and accuracy across a wide range of conditions. The model is trained on 680,000 hours of multilingual and multitask supervised data collected from the web."

AI4Bharat Acknowledgement

We thank AI4Bharat for creating and maintaining the Kathbath Telugu dataset, which was crucial for this fine-tuning work. AI4Bharat's mission to advance Indian language AI research has made significant contributions to the NLP community.

Additional Thanks

The open-source community for tools and libraries that made this research possible
Contributors to the Hugging Face Transformers ecosystem
The Telugu-speaking community for providing valuable speech data

License

This fine-tuned model inherits the license terms from the original Whisper model and the AI4Bharat Kathbath dataset. Please refer to the original sources for specific licensing information.

Downloads last month: 45

Safetensors

Model size

37.8M params

Tensor type

F32

Model tree for vanshnawander/whisper-tiny-telugu

Base model

openai/whisper-tiny

Finetuned

(1839)

this model

Paper for vanshnawander/whisper-tiny-telugu

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 54