Papers
arxiv:2606.03504

BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language

Published on Jun 2
Authors:

Abstract

A 16.8-hour Balti speech corpus was created and used to fine-tune OpenAI Whisper-small, achieving a 30.07% WER compared to 182.18% for zero-shot baseline performance.

We present BaltiVoice, a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language spoken in Gilgit-Baltistan, Pakistan, with no prior publicly available ASR resources. The corpus contains 10,060 validated utterances in native Nastaliq script, derived from Mozilla Common Voice recordings. We fine-tune OpenAI Whisper-small on this corpus and report a Word Error Rate (WER) of 30.07% on a held-out validation set of 538 utterances, down from a measured zero-shot baseline of 182.18% for Whisper-small on Balti. The dataset, fine-tuned model, and a live transcription demo are publicly available on HuggingFace.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03504 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.