Shunya Labs Hinglish ASR Model
We wanted to make ASR that could intuitively capture how conversational Hindi is actually spoken. On average, every 2 words out of 10 spoken in conversational Hindi are in English. Traditional ARS models are trained to handle one language at a time, which makes them too slow and inaccurate when transcribing multilingual speech.
And so we innovated. We trained Zero STT Codeswitch to natively process Hinglish speech and generate mixed-script tokens.
This is a model worthy of how India actually speaks, because it can capture the way people naturally switch between Hindi and English mid-conversation.
And now, we're making the lighter version of Zero STT Codeswitch open source for the community!
For a faster version of Zero STT Codeswitch, visit shunyalabs.ai.
Model Details
Base Model: OpenAI Whisper Medium
Post-trained by: Shunya Labs
Language: Hinglish (Hindi-English code-switching)
Why This Model?
Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk. This model was trained specifically on code-switched speech, so it:
- Transcribes Hindi and English tokens as they naturally occur
- Handles mid-sentence language switches accurately
- Produces faster inference by avoiding language detection overhead
- Delivers higher accuracy on real-world Hinglish speech
Demo
- Try the model at: https://www.shunyalabs.ai/zero-code-switch
Transcription Comparison
| Audio | Zero STT Codeswitch | Whisper Medium |
|---|---|---|
| Rome में अलग अलग जगों पर कई बढ़े television screens लगाए गये ग ताकि लोग समारो देख सकें | रोम में अलग अलग जगहों पर कई बड़े टेलिवीजन स्क्रीन लगाए गए ताकि लोग स्मारो देख सकें | |
| और बागल में एक building है लाल कलर का पिंट किया हुआ | और बगल में एक बिल्डिंग है लाल कलर का पेंट किया हुआ है | |
| yoga med पर yoga कर रहे हैं | योगा मैट पर योगा कर रहे हैं |
Use Cases
- Transcription of Hinglish conversations, podcasts, and videos
- Customer support and conversational agents serving Indian users
- Meeting transcription for Indian workplaces
- Content creation and subtitling
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])
Training Details
Openai/whisper-medium post-trained on Google Vaani as well as proprietary datasets.
For a faster version of Zero STT Codeswitch, vistit shunyalabs.ai
- Downloads last month
- 172
Model tree for shunyalabs/zero-stt-hinglish
Base model
openai/whisper-medium