Indic Whisper β ggml builds for whisper.cpp
ggml + q5_1 quantized builds of Indian-language Whisper fine-tunes, for
on-device speech-to-text via whisper.cpp.
Used by the Ukta in-store feedback kiosk for accurate regional-language
transcription.
Files
File naming: ggml-<langCode>-small.bin β q5_1 quantized, ~181 MiB each.
| Language | Code | File |
|---|---|---|
| Hindi | hi | ggml-hi-small.bin |
| Kannada | kn | ggml-kn-small.bin |
| Tamil | ta | ggml-ta-small.bin |
| Telugu | te | ggml-te-small.bin |
| Gujarati | gu | ggml-gu-small.bin |
These are monolingual β each model transcribes only its own language.
Malayalam/Marathi/Odia/Punjabi/Bengali are not yet covered (no published
vasista22 small fine-tune); those languages fall back to a general model.
Provenance & attribution
- Fine-tuned source models: vasista22
(
whisper-<language>-{base,small}), Β© Speech Lab, IIT Madras β Apache 2.0. - Base architecture/weights: OpenAI Whisper β MIT.
- Training corpora include AI4Bharat datasets (Shrutilipi, Vistaar) and Fleurs (CC-BY).
- Conversion:
whisper.cpp/models/convert-h5-to-ggml.pyβ f16 ggml, thenquantize ... q5_1.
This repository redistributes derivatives of the above under the Apache License
2.0; see LICENSE. No change was made to the model weights other than format
conversion and quantization.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support