Indic Conformer ASR — Hindi (600M)

600M-parameter Conformer encoder for Hindi automatic speech recognition, evaluated on all 7 subsets of the Vistaar benchmark. Achieves 12.09% average WER with a custom 5-gram KenLM across read speech, noisy speech, broadcast, conversational, and rural dialectal Hindi.

Runs locally on CPU, Apple Silicon MPS, and NVIDIA CUDA — no GPU required. On Apple M4 CPU: 0.27× RTF (3.7× faster than real-time). On Apple MPS: ~0.03–0.05× RTF (20–30× faster than real-time).

Code and evaluation scripts: github.com/abhayverma6300/indic-asr-conformer


Vistaar Results

WER with Devanagari-aware normalisation (dandas and punctuation stripped). Beam width 100.

Dataset Domain Greedy WER + Hindi-5M LM
Kathbath Read speech 10.34% 9.00%
Kathbath Noisy Noisy read speech 11.86% 10.19%
FLEURS Broadcast / read 12.68% 11.18%
CommonVoice Crowd-sourced read 16.57% 12.54%
IndicTTS TTS-derived 9.49% 8.55%
MUCS Conversational 10.41% 9.05%
Gramvaani Rural / dialectal 27.61% 24.09%
Average 14.14% 12.09%

Leaderboard context

Model Avg WER Open weights CPU inference
Indic Conformer 600M + Hindi-5M LM 12.09% yes yes
IndicWhisper (Whisper-medium fine-tuned) 13.6% yes slow
Nvidia NeMo large 18.6% yes no
Azure STT ~20% no no
Google STT ~24% no no

Numbers for other models from the Vistaar paper (AI4Bharat, 2023).


Model files

File Size Description
am_model.pt 2.4 GB Original TorchScript AM (CUDA device literals)
am_model_cpu.pt 2.4 GB Patched for CPU inference
am_model_mps.pt 2.4 GB Patched for Apple Silicon MPS
preprocessor.pt ~92 KB Log-Mel frontend
lm/hindi/hi.bin 145 MB 5-gram KenLM (Hindi-5M)
lm/hindi/unigrams.txt — 201k Hindi words for pyctcdecode

Quickstart

Install dependencies

pip install torch torchaudio pyctcdecode

CPU inference

git clone https://github.com/Abhay-Verma031/indic-asr-conformer
cd indic-asr-conformer

huggingface-cli download Abhay-Verma031/indic-conformer-600m \
    --local-dir extracted_models_v3/

python inference/cpu_infer.py \
    --audio speech.wav \
    --language hi \
    --preprocessor extracted_models_v3/preprocessor.pt \
    --am extracted_models_v3/am_model_cpu.pt \
    --lm extracted_models_v3/lm/hindi/hi.bin

Apple Silicon MPS

python inference/cpu_infer.py \
    --audio speech.wav \
    --language hi \
    --preprocessor extracted_models_v3/preprocessor.pt \
    --am extracted_models_v3/am_model_mps.pt \
    --device mps \
    --lm extracted_models_v3/lm/hindi/hi.bin

NVIDIA GPU

python inference/gpu_infer.py \
    --audio speech.wav \
    --language hi \
    --preprocessor extracted_models_v3/preprocessor.pt \
    --am extracted_models_v3/am_model.pt \
    --lm extracted_models_v3/lm/hindi/hi.bin

Architecture

AUDIO (16 kHz mono, FP32)
        │
        â–¼
  asr_preprocessor      80-dim log-Mel filterbank  [B, 80, T']
        │
        â–¼
      asr_am             Conformer encoder, ~600M params
                         output: CTC logprobs  [B, T', 257]
                         (256 Hindi BPE tokens + CTC blank)
        │
        â–¼
    asr_decoder          pyctcdecode CTC beam search + KenLM
                         α=0.3  β=1.0  beam_width=100
        │
        â–¼
    TRANSCRIPT

The AM is a multilingual model covering all 22 scheduled Indian languages via a 5633-token multilingual BPE vocabulary. Each language uses a 256-token slice at a fixed offset — for Hindi the slice starts at offset 1536. The model is exported as TorchScript; inference requires only torch and torchaudio.


Hindi language model

The greedy CTC baseline (14.14% avg WER) is already competitive. The Hindi-5M KenLM brings it to 12.09% — a further 2.05pp — by rescoring beam candidates with 5-gram language model scores.

Hindi-5M
Order 5-gram
Binary size 145 MB
Training sentences 5,000,000
Unigrams 201,136
α 0.3
β 1.0

Training corpus: Wikipedia (hi), CC-100 (hi), CulturaX (hi), OSCAR-2301 (hi), C4 (hi) — ~5M sentences after deduplication and Devanagari filtering.


Citation

@misc{indic-conformer-600m,
  author = {Abhay Verma},
  title  = {Indic Conformer ASR — Hindi 600M},
  year   = {2026},
  url    = {https://huggingface.co/abhayverma6300/indic-conformer-600m}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Abhay-Verma031/indic-conformer-600m

Evaluation results