Zehnova-Uzbek-STT / README.md
Jonibek21's picture
Update README.md
7570b31 verified
---
language:
- uz
license: apache-2.0
tags:
- whisper
- speech-recognition
- uzbek
- fine-tuned
- asr
base_model: Kotib/uzbek_stt_v1
pipeline_tag: automatic-speech-recognition
---
# Zehnova STT — O'zbek tili uchun Speech-to-Text modeli
O'zbek tili uchun fine-tune qilingan Whisper Medium asosidagi
avtomatik nutqni matnга aylantirish modeli.
## Model haqida
- **Model turi:** Automatic Speech Recognition (ASR)
- **Asos model:** `Kotib/uzbek_stt_v1` (Whisper Medium)
- **Fine-tuning usuli:** LoRA (Low-Rank Adaptation)
- **Til:** O'zbek tili 🇺🇿
- **Muallif:** Jonibek21
## Ishlatish
```python
from transformers import WhisperForConditionalGeneration, WhisperProcessor, pipeline
import torch
model_id = "Jonibek21/Zehnova-stt-uzbek"
model = WhisperForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16
).to("cuda")
processor = WhisperProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
stride_length_s=5,
batch_size=4,
device=0,
)
result = pipe(
"audio.wav",
generate_kwargs={
"language": "uz",
"task": "transcribe",
"no_repeat_ngram_size": 3
}
)
print(result["text"])
```
## Training ma'lumotlari
- **Dataset:** Maxsus O'zbek tili audio dataseti
- **Train samples:** 9,214
- **Test samples:** 1,024
- **Dataset vaqti:** 16 soat
- **Training hardware:** NVIDIA RTX 3090 (24GB)
- **Training framework:** Hugging Face Transformers + PEFT
- **Precision:** fp16
- **LoRA rank:** 32
- **LoRA alpha:** 64
- **LoRA target modules:** q_proj, v_proj
## 📊 Model Evaluation (WER)
| Category | WER |
|--------------|-----|
| **Overall** | **~11-13%** |
| Clean Speech | ~6-11% |
| Noisy/Augme | ~9-16% |
| News / Formal| ~11-12% |
> Base model (Kotib/uzbek_stt_v1) overall WER: 16.7%
> Zehnova modeli base modeldan **~5% yaxshiroq** natija ko'rsatdi.
## Cheklovlar
- Faqat o'zbek tilida ishlaydi
- Shovqinli audio da sifat pasayishi mumkin
- 30 soniyadan uzun audiolar bo'laklarga bo'linadi
## Date
- 01/05/2026