You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Garo ASR - Whisper Small Fine-tuned

Fine-tuned Whisper Small model for Automatic Speech Recognition in Garo language.

Model Details

  • Base Model: openai/whisper-small (244M parameters)
  • Language: Garo (Tibeto-Burman language family)
  • Training Data: ARTPARK-IISc
  • Training Samples: 26,784
  • Test Samples: 3,348

Performance

Metric Score
Word Error Rate (WER) 9.74%
Character Error Rate (CER) 3.82%

Baseline Comparison

Model WER CER
Whisper-small (zero-shot) 382.7% -
Whisper-base (zero-shot) - 203.46%
This model (fine-tuned) 9.74% 3.82%

Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load model and processor
processor = WhisperProcessor.from_pretrained("MWirelabs/garo-asr")
model = WhisperForConditionalGeneration.from_pretrained("MWirelabs/garo-asr")

# Load audio (16kHz)
# audio_array = your audio as numpy array

# Generate transcription
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    predicted_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

Training Details

  • Training Steps: 4,000 (best checkpoint at 2,500)
  • Batch Size: 16 per device
  • Gradient Accumulation: 2 steps
  • Learning Rate: 1e-5
  • Warmup Steps: 500
  • Precision: FP16

Limitations

  • Performance degrades on English loanwords and code-switching
  • 432 samples (12.9%) in test set contain annotation noise
  • ~43% error rate on code-switched utterances with English words

Dataset Statistics

  • Audio Duration: Mean 4.04s, Median 3.81s (range: 1.78-11.13s)
  • Vocabulary: 3,621 unique words
  • Type-Token Ratio: 0.148

Inference Speed

  • Average: 0.252s per sample
  • Real-time Factor: 0.05x (20x faster than real-time)

Citation

If you use this model, please cite:

@misc{garo-asr-2026,
  author = {MWire Labs},
  title = {Garo ASR: Fine-tuned Whisper for Garo Language},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/MWirelabs/garo-asr}
}

Acknowledgments

  • Dataset: ARTPARK-IISc Vaani project
  • Base Model: OpenAI Whisper """
Downloads last month
37
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MWirelabs/garo-asr

Finetuned
(3208)
this model

Space using MWirelabs/garo-asr 1

Evaluation results