You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Whisper Medium Uzbek v1 by Kotibai & Rubai Team

Developed by Kotibai & Rubai Team

Uzbek Automatic Speech Recognition (ASR) model fine-tuned from Whisper Medium.

Model Description

  • Base Model: OpenAI Whisper Medium (769M parameters)
  • Language: Uzbek (uz)
  • Training Data: ~1,600 hours of Uzbek audio
  • Precision: BF16
  • Script: Latin (handles Russian loanwords in Latin script: "brat", "davay", "prosto", etc.)

Evaluation Results

Category WER
Overall 16.7%
Clean Speech ~6-11%
Noisy/Augmented ~12-24%
Dialects ~16-25%

Evaluated on 1,864 samples across 8 diverse test sets.

Usage

Using Transformers

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

processor = WhisperProcessor.from_pretrained("Kotib/uzbek_stt_v1")
model = WhisperForConditionalGeneration.from_pretrained("Kotib/uzbek_stt_v1")

audio, sr = librosa.load("audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

predicted_ids = model.generate(input_features, language="uz", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Using Pipeline

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="Kotib/uzbek_stt_v1",
    chunk_length_s=30,
    device="cuda"
)

result = pipe("audio.wav", generate_kwargs={"language": "uz", "task": "transcribe"})
print(result["text"])

Training

Trained in 3 stages using curriculum learning:

Stage Hours
Foundation 725h
Robustness 394h
Domain Adaptation 474h

Intended Use

  • Uzbek speech-to-text transcription
  • Voice assistants and dictation
  • Media transcription and subtitling

Limitations

  • Performance degrades on very noisy audio
  • May struggle with heavy code-switching
  • Optimized for Uzbek only

License

Apache 2.0

Downloads last month
2
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bahrom1996/stt_uzbek_medium_2025

Finetuned
(794)
this model

Evaluation results