Shunya Labs Hinglish ASR Model

We wanted to make ASR that could intuitively capture how conversational Hindi is actually spoken. On average, every 2 words out of 10 spoken in conversational Hindi are in English. Traditional ARS models are trained to handle one language at a time, which makes them too slow and inaccurate when transcribing multilingual speech.

And so we innovated. We trained Zero STT Codeswitch to natively process Hinglish speech and generate mixed-script tokens.

This is a model worthy of how India actually speaks, because it can capture the way people naturally switch between Hindi and English mid-conversation.

And now, we're making the lighter version of Zero STT Codeswitch open source for the community!

For a faster version of Zero STT Codeswitch, visit shunyalabs.ai.

Model Details

Base Model: OpenAI Whisper Medium

Post-trained by: Shunya Labs

Language: Hinglish (Hindi-English code-switching)

Why This Model?

Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk. This model was trained specifically on code-switched speech, so it:

Transcribes Hindi and English tokens as they naturally occur
Handles mid-sentence language switches accurately
Produces faster inference by avoiding language detection overhead
Delivers higher accuracy on real-world Hinglish speech

Demo

Try the model at: https://www.shunyalabs.ai/zero-code-switch

Transcription Comparison

Audio	Zero STT Codeswitch	Whisper Medium
	Rome में अलग अलग जगों पर कई बढ़े television screens लगाए गये ग ताकि लोग समारो देख सकें	रोम में अलग अलग जगहों पर कई बड़े टेलिवीजन स्क्रीन लगाए गए ताकि लोग स्मारो देख सकें
	और बागल में एक building है लाल कलर का पिंट किया हुआ	और बगल में एक बिल्डिंग है लाल कलर का पेंट किया हुआ है
	yoga med पर yoga कर रहे हैं	योगा मैट पर योगा कर रहे हैं

Use Cases

Transcription of Hinglish conversations, podcasts, and videos
Customer support and conversational agents serving Indian users
Meeting transcription for Indian workplaces
Content creation and subtitling

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])

Training Details

Openai/whisper-medium post-trained on Google Vaani as well as proprietary datasets.

For a faster version of Zero STT Codeswitch, vistit shunyalabs.ai

Downloads last month: 450

Safetensors

Model size

0.8B params

Tensor type

F16

Model tree for shunyalabs/zero-stt-hinglish

Base model

openai/whisper-medium

Finetuned

(890)

this model

shunyalabs
/

zero-stt-hinglish