YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Whisper Small Setswana (LoRA) - 5,000 Steps

This model is a fine-tuned version of openai/whisper-small on the Setswana (tn) Common Voice dataset. It was optimized specifically for high-accuracy Speech-to-Text (ASR) as part of the PuoSpeaker project.

πŸš€ Training Summary

  • Total Steps: 5,000
  • Final Training Loss: 0.1736
  • Hardware: NVIDIA RTX A4000 (16GB VRAM)
  • Method: Parameter-Efficient Fine-Tuning (PEFT) using LoRA (Rank 64, Alpha 128).
  • Duration: ~5.5 hours.

πŸ“Š Capabilities & Limitations

βœ… Automatic Speech Recognition (ASR) - "Near-Perfect"

The model shows exceptional performance in capturing Setswana phonetics, tone, and rhythm.

  • Pros: Handles fast native speech and subtle vowel shifts (e.g., 'Γͺ' and 'Γ΄') with high precision.
  • Suitability: Professional-grade transcription and pronunciation scoring.

⚠️ Text-to-Speech (TTS) - "Low Quality"

While the ASR is state-of-the-art, the current Text-to-Speech (TTS) integration (XTTS-v2) is in a prototype stage.

  • Current State: Low prosody alignment and robotic rhythm.
  • Next Steps: Dedicated TTS prosody fine-tuning is required to match the ASR's quality.

πŸ› οΈ Usage (Python)

from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch

base_model = "openai/whisper-small"
adapter_model = "ogaufi/whisper-small-tn-lora-v2" # Recommended 5k checkpoint

processor = WhisperProcessor.from_pretrained(base_model)
model = WhisperForConditionalGeneration.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)

# Standard Whisper inference pipeline follows...

πŸ“š Dataset Details

Trained on the Common Voice Setswana corpus, specifically focusing on the validated split to ensure high-quality linguistic grounding.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support