Shunya Labs Hinglish ASR Model

We wanted to make ASR that could intuitively capture how conversational Hindi is actually spoken. On average, every 2 words out of 10 spoken in conversational Hindi are in English. Traditional ARS models are trained to handle one language at a time, which makes them too slow and inaccurate when transcribing multilingual speech.

And so we innovated. We trained Zero STT Codeswitch to natively process Hinglish speech and generate mixed-script tokens.

This is a model worthy of how India actually speaks, because it can capture the way people naturally switch between Hindi and English mid-conversation.

And now, we're making the lighter version of Zero STT Codeswitch open source for the community!

For a faster version of Zero STT Codeswitch, visit shunyalabs.ai.

Model Details

Base Model: OpenAI Whisper Medium

Post-trained by: Shunya Labs

Language: Hinglish (Hindi-English code-switching)

Why This Model?

Standard ASR models treat Hindi and English as separate languages, forcing transcription into one or the other. This creates errors when speakers naturally switch between languages mid-sentence—which is how millions of people actually talk. This model was trained specifically on code-switched speech, so it:

  • Transcribes Hindi and English tokens as they naturally occur
  • Handles mid-sentence language switches accurately
  • Produces faster inference by avoiding language detection overhead
  • Delivers higher accuracy on real-world Hinglish speech

Demo

Transcription Comparison

Audio Zero STT Codeswitch Whisper Medium
Rome में अलग अलग जगों पर कई बढ़े television screens लगाए गये ग ताकि लोग समारो देख सकें रोम में अलग अलग जगहों पर कई बड़े टेलिवीजन स्क्रीन लगाए गए ताकि लोग स्मारो देख सकें
और बागल में एक building है लाल कलर का पिंट किया हुआ और बगल में एक बिल्डिंग है लाल कलर का पेंट किया हुआ है
yoga med पर yoga कर रहे हैं योगा मैट पर योगा कर रहे हैं

Use Cases

  • Transcription of Hinglish conversations, podcasts, and videos
  • Customer support and conversational agents serving Indian users
  • Meeting transcription for Indian workplaces
  • Content creation and subtitling

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="shunya-labs/hinglish-whisper-medium")
result = transcriber("audio.mp3")
print(result["text"])

Training Details

Openai/whisper-medium post-trained on Google Vaani as well as proprietary datasets.

For a faster version of Zero STT Codeswitch, vistit shunyalabs.ai

Downloads last month
172
Safetensors
Model size
0.8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shunyalabs/zero-stt-hinglish

Finetuned
(766)
this model

Datasets used to train shunyalabs/zero-stt-hinglish