Indic CrisperWhisper v1 (Hindi)

Verbatim Hindi ASR with word-level timestamps. Fine-tuned from vasista22/whisper-hindi-large-v2 on IndicVoices-R Hindi using the CrisperWhisper dual-loss approach.

Results (IndicVoices-R Hindi test set, 376 samples)

Model WER vs Normalised WER vs Verbatim
Base (vasista22/whisper-hindi-large-v2) 23.52% 25.91%
Indic CrisperWhisper v1 18.86% 18.60%

Usage

pip install torch torchaudio "transformers>=4.37" huggingface_hub numpy
python transcribe_hindi.py audio.wav --model_dir user71/indic-crisperwhisper-hindi-v1
from transcribe_hindi import load_model, transcribe

pipe   = load_model("user71/indic-crisperwhisper-hindi-v1")
result = transcribe("audio.wav", pipe=pipe)

print(result["text"])
for chunk in result["chunks"]:
    word  = chunk["text"]
    start = chunk["timestamp"][0]
    end   = chunk["timestamp"][1]
    print(f"{word:<20}  {start:.3f}s – {end:.3f}s")

Output includes Hindi filler tokens: [UM], [UH], [FILLER], [PAUSE].

Files

File Description
model-*.safetensors Model weights (Whisper large-v2 based, 1.5B params)
retok_config.json Retokenisation config (filler token IDs, vocab size=50368)
alignment_heads.json 15 selected cross-attention heads used for DTW timestamps

Citation

Built on CrisperWhisper (Kohler et al., INTERSPEECH 2024) Training data: IndicVoices-R (Sangroya et al., 2024)

Use this citation to cite us:

Sanat Kumar Agarwal and Srikanth Raj Chetupalli, "IndicCrisperWhisper-Time-stamped transcription for IndicVocies using CrisperWhisper", , May 2026.

Downloads last month
90
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for user71/indic-crisperwhisper-hindi-v1