Papers
arxiv:2512.10967

ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages

Published on Nov 30, 2025
Authors:
,
,
,
,
,
,
,
,

Abstract

Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare contexts remains largely unknown. In this study, we conduct the first systematic audit of ASR performance on real world clinical interview data spanning Kannada, Hindi, and Indian English, comparing leading models including Indic Whisper, Whisper, Sarvam, Google speech to text, Gemma3n, Omnilingual, Vaani, and Gemini. We evaluate transcription accuracy across languages, speakers, and demographic subgroups, with a particular focus on error patterns affecting patients vs. clinicians and gender based or intersectional disparities. Our results reveal substantial variability across models and languages, with some systems performing competitively on Indian English but failing on code mixed or vernacular speech. We also uncover systematic performance gaps tied to speaker role and gender, raising concerns about equitable deployment in clinical settings. By providing a comprehensive multilingual benchmark and fairness analysis, our work highlights the need for culturally and demographically inclusive ASR development for healthcare ecosystem in India.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2512.10967
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.10967 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.10967 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.