Whisper Uyghur ASR

Fine-tuned OpenAI Whisper-medium model for Uyghur Automatic Speech Recognition.

Model

  • Base Model: openai/whisper-medium
  • Language: Uyghur (ug)
  • Checkpoint: step-2000

Performance

Metric Value
WER 22.53%
CER 12.56%

Dataset

Split Samples Duration
Train 11,287 17.44h
Validation 1,045 1.37h
Test 795 0.86h
Total 13,127 19.67h

Sources:

  • Common Voice Uyghur (CC0-1.0): 5,197 samples
  • Uyghur Whisper Finetune (CC-BY-4.0): 7,930 samples

Installation

pip install torch transformers datasets librosa soundfile

Usage

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa

# Load model
model = WhisperForConditionalGeneration.from_pretrained("last_model")
processor = WhisperProcessor.from_pretrained("last_model")

# Transcribe
audio, sr = librosa.load("audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(**inputs)
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(text)

Test Result

See last_model/test_audio.aac and last_model/test_audio.txt for sample inference.

Project Structure

β”œβ”€β”€ last_model/           # Fine-tuned model
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   β”œβ”€β”€ infer.py
β”‚   β”œβ”€β”€ test_audio.aac
β”‚   └── test_audio.txt
β”œβ”€β”€ merged_dataset_clean/ # Training dataset
β”œβ”€β”€ finetune_whisper.py   # Training script
β”œβ”€β”€ training_output.log   # Training log
└── requirements.txt

Training Configuration

Parameter Value
Epochs 10
Batch Size 8
Gradient Accumulation 2
Learning Rate 1e-5
Warmup Steps 500
FP16 True

License

  • Code: MIT License
  • Model: MIT License
  • Dataset: CC0-1.0 / CC-BY-4.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support