aict-sharif-edu commited on
Commit
02447a9
·
verified ·
1 Parent(s): 6f94cb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -22,15 +22,32 @@ This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.
22
 
23
  ## Model description
24
 
25
- More information needed
 
 
 
 
 
26
 
27
  ## Intended uses & limitations
28
 
29
- More information needed
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
 
 
34
 
35
  ## Training procedure
36
 
@@ -49,6 +66,8 @@ The following hyperparameters were used during training:
49
 
50
  ### Training results
51
 
 
 
52
 
53
 
54
  ### Framework versions
 
22
 
23
  ## Model description
24
 
25
+ Whisper-tiny-fa is an automatic speech recognition model specifically adapted for Persian (Farsi) speech. It builds upon OpenAI’s Whisper-tiny architecture, leveraging transfer learning to specialize in transcribing Persian audio. The model is suitable for converting spoken Persian audio into text, enabling applications such as voice assistants, captioning, and speech-driven user interfaces.
26
+
27
+ - Base model: openai/whisper-tiny
28
+ - Fine-tuned on: Common Voice 17.0 Persian subset
29
+ - Languages supported: Persian (Farsi)
30
+ - Model type: Encoder-decoder transformer (speech-to-text)
31
 
32
  ## Intended uses & limitations
33
 
34
+ ### Intended uses:
35
+
36
+ Transcribing Persian (Farsi) speech to text from audio files or microphone input.
37
+ Voice-controlled applications and speech interfaces for Persian speakers.
38
+ Generating subtitles and closed captions in Persian for audio/video content.
39
+
40
+ ### Limitations:
41
+
42
+ The model is fine-tuned for Persian and may perform poorly on other languages.
43
+ Performance may degrade with low-quality or noisy audio, accents, or dialects not well represented in the training data.
44
+ Not suitable for real-time applications with strict latency constraints due to model size and processing requirements.
45
 
46
  ## Training and evaluation data
47
 
48
+ Dataset: Common Voice 17.0 (Persian subset)
49
+ Data split: Training, validation, and test splits provided by Common Voice were used.
50
+ Preprocessing: Audio files were resampled to 16kHz and normalized. Transcripts were cleaned and normalized to standard Persian orthography.
51
 
52
  ## Training procedure
53
 
 
66
 
67
  ### Training results
68
 
69
+ - Best validation WER (Word Error Rate): 0.915
70
+ - Best validation CER (Character Error Rate): 0.428
71
 
72
 
73
  ### Framework versions