mozilla-foundation/common_voice_17_0
Updated • 4.62k • 27
Model: d3b4g/whisper-small-dv-corrected
Base model: openai/whisper-small
Language: Dhivehi (Thaana)
Dataset: Mozilla Common Voice 17.0 (dv)
Training Platform: RunPod RTX 4090
Training Epochs: 4.05
Best Checkpoint: 7000
Eval WER: 0.1107 (≈ 11%)
Eval Loss: 0.0055
This model is a fine-tuned version of OpenAI’s Whisper Small, trained specifically for Dhivehi (Thaana) speech recognition using the Mozilla Common Voice 17.0 dataset.
The goal of this project was to create an open, accurate Dhivehi ASR model suitable for transcription, voice assistants, and dataset generation.
During fine-tuning, the model achieved a Word Error Rate (WER) of 11.07%, showing strong performance on clean speech and conversational recordings.
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
# Load processor and model
processor = WhisperProcessor.from_pretrained("d3b4g/whisper-small-dv-corrected")
model = WhisperForConditionalGeneration.from_pretrained("d3b4g/whisper-small-dv-corrected")
# Load audio file
audio, rate = sf.read("sample.wav")
# Preprocess and predict
input_features = processor(audio, sampling_rate=rate, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
# Decode to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)