Instructions to use vanshnawander/whisper-tiny-telugu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vanshnawander/whisper-tiny-telugu with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="vanshnawander/whisper-tiny-telugu")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("vanshnawander/whisper-tiny-telugu") model = AutoModelForSpeechSeq2Seq.from_pretrained("vanshnawander/whisper-tiny-telugu") - Notebooks
- Google Colab
- Kaggle
Whisper Tiny Telugu Fine-tuning
Overview
This project demonstrates the fine-tuning of OpenAI's Whisper Tiny model for Telugu speech recognition. The base Whisper Tiny model was fine-tuned on the AI4Bharat Kathbath Telugu dataset to improve performance specifically for Telugu language transcription tasks.
Model Details
- Base Model: OpenAI Whisper Tiny
- Fine-tuned Model: Whisper Tiny fine-tuned on Telugu
- Training Dataset: AI4Bharat Kathbath Telugu Dataset
- Language: Telugu (te)
Training Dataset
The model was fine-tuned using the AI4Bharat Kathbath Telugu Dataset, which is a comprehensive collection of Telugu speech data designed for speech recognition tasks. This dataset contains diverse Telugu speech samples covering various domains and speakers, making it ideal for fine-tuning speech recognition models.
Performance Comparison
Overall Metrics
| Metric | Base Model | Fine-tuned | Improvement |
|---|---|---|---|
| WER (%) | 402.96 | 97.83 | ↓ 75.7% |
| CER (%) | 171.09 | 77.13 | ↓ 54.9% |
| Avg Time (ms) | 367.3 | 711.0 | ↓ -93.6% |
Detailed Word Error Analysis
| Error Type | Base Model | Fine-tuned | Reduction |
|---|---|---|---|
| Substitutions | 362 | 268 | ↓ 94 |
| Deletions | 145 | 212 | ↓ -67 |
| Insertions | 1536 | 16 | ↓ 1520 |
| Hits (correct) | 0 | 27 | ↑ 27 |
Key Findings
- Significant WER Improvement: 75.7% reduction in Word Error Rate
- CER Reduction: 54.9% improvement in Character Error Rate
- Insertion Reduction: Dramatic 1520 reduction in insertion errors
- Processing Time: Increased inference time due to fine-tuning complexity
Usage
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
# Load the fine-tuned model
model = WhisperForConditionalGeneration.from_pretrained("vanshnawander/whisper-tiny-telugu")
processor = WhisperProcessor.from_pretrained("vanshnawander/whisper-tiny-telugu")
# For inference
audio_input = # Your audio tensor
input_features = processor(audio_input, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
Citation
If you use this fine-tuned model or the methodology, please cite:
AI4Bharat Kathbath Dataset
@misc{ai4bharat_kathbath,
title={Kathbath: A Large-Scale Speech Corpus for Indian Languages},
author={AI4Bharat Team},
year={2022},
url={https://ai4bharat.iitm.ac.in/kathbath}
}
Whisper Original Paper
@article{radford2022robust,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and et al.},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}
Fine-tuned Model
@misc{vanshnawander2026whisper_tiny_telugu,
title={Whisper Tiny Telugu: Fine-tuned for Telugu Speech Recognition},
author={Vansh Nawander},
year={2026},
url={https://huggingface.co/vanshnawander/whisper-tiny-telugu}
}
Acknowledgements
Whisper Base Model Acknowledgement
This work builds upon OpenAI's Whisper model. We acknowledge the OpenAI team for creating the Whisper architecture and pre-trained models:
"Whisper is a robust speech recognition model. It approaches human level robustness and accuracy across a wide range of conditions. The model is trained on 680,000 hours of multilingual and multitask supervised data collected from the web."
AI4Bharat Acknowledgement
We thank AI4Bharat for creating and maintaining the Kathbath Telugu dataset, which was crucial for this fine-tuning work. AI4Bharat's mission to advance Indian language AI research has made significant contributions to the NLP community.
Additional Thanks
- The open-source community for tools and libraries that made this research possible
- Contributors to the Hugging Face Transformers ecosystem
- The Telugu-speaking community for providing valuable speech data
License
This fine-tuned model inherits the license terms from the original Whisper model and the AI4Bharat Kathbath dataset. Please refer to the original sources for specific licensing information.
- Downloads last month
- 45
Model tree for vanshnawander/whisper-tiny-telugu
Base model
openai/whisper-tiny