Firdavs222 commited on
Commit
32f70a1
·
verified ·
1 Parent(s): fa18ce9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - uz
5
+ - en
6
+ - ru
7
+ metrics:
8
+ - wer
9
+ base_model:
10
+ - openai/whisper-small
11
+ pipeline_tag: automatic-speech-recognition
12
+ tags:
13
+ - speech-recognition
14
+ - whisper
15
+ - multilingual
16
+ - uzbek
17
+ - russian
18
+ - english
19
+ ---
20
+
21
+ # Multilingual Whisper (Uz/En/Ru) — Fine-tuned Speech-to-Text Model
22
+
23
+ A fine-tuned **Whisper Small** model optimized to transcribe **Uzbek, English, and Russian equally well**.
24
+ This model is intended for real-world speech transcription with a balanced multilingual dataset and performs competitively against strong open-source and commercial STT solutions.
25
+
26
+ ---
27
+
28
+ ## Model Details
29
+
30
+ ### Model Description
31
+
32
+ This model extends **OpenAI Whisper Small** by fine-tuning it on a multilingual speech mixture, aimed to deliver robust ASR performance for **Uzbek**, **English**, and **Russian** speakers.
33
+ The goal was to reduce the performance gap between languages, especially improving **Uzbek** speech recognition, where public ASR resources are scarce.
34
+
35
+ - **Model type:** Automatic Speech Recognition (ASR)
36
+ - **Language(s):** Uzbek 🇺🇿, English 🇬🇧, Russian 🇷🇺
37
+ - **License:** Apache-2.0
38
+ - **Finetuned from:** openai/whisper-small
39
+ - **Intended usage:** Real-time & offline speech-to-text
40
+
41
+ ---
42
+ ## Trained datasets:
43
+ - DavronSherbaev/uzbekvoice-filtered
44
+ - telegram-voice-messages (private collection)
45
+ - navaistt-open-datasets
46
+ - sovaai/russian-audiobooks
47
+ - librispeech
48
+
49
+ ## Evaluation
50
+
51
+ ### Word Error Rate (WER) Comparison
52
+
53
+ | Model | WER ↓ |
54
+ |--------------------------------|----------|
55
+ | Whisper-small-uz-v1 | **34.00%** |
56
+ | Gemini (Commercial) | 36.21% |
57
+ | NavaiSTT v2 (Open-Source) | 35.14% |
58
+ | Aisha STT (Commercial) | 41.71% |
59
+
60
+ The model **outperforms both commercial and open-source Uzbek STT models**, showing strong generalization for informal real-world speech.
61
+
62
+ ---
63
+
64
+ ## Usage Example
65
+
66
+ ```python
67
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
68
+ import torch
69
+ import torchaudio
70
+
71
+ model_id = "Firdavs222/whisper-small-uz-v1" # replace with real model repo
72
+
73
+ processor = WhisperProcessor.from_pretrained(model_id)
74
+ model = WhisperForConditionalGeneration.from_pretrained(model_id)
75
+
76
+ audio, sr = torchaudio.load("audio.wav")
77
+ inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
78
+
79
+ with torch.no_grad():
80
+ predicted_ids = model.generate(inputs.input_features)
81
+ text = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
82
+
83
+ print(text) # → transcribed text here