Paulwalker4884 commited on
Commit
80f96e5
·
verified ·
1 Parent(s): 3b80a8b

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +90 -6
README.md CHANGED
@@ -1,6 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Whisper Persian Fine-tuned Model
2
 
3
- A fine-tuned Whisper model optimized for Persian (Farsi) speech-to-text conversion using LoRA (Low-Rank Adaptation) technique.
4
 
5
  ## Model Details
6
 
@@ -45,6 +75,15 @@ The model can be integrated into larger applications such as:
45
  - Not suitable for noisy environments without proper audio preprocessing
46
  - May have reduced accuracy on dialects significantly different from the training data
47
 
 
 
 
 
 
 
 
 
 
48
  ## How to Get Started with the Model
49
 
50
  ### Installation
@@ -52,11 +91,48 @@ The model can be integrated into larger applications such as:
52
  First, install the required dependencies:
53
 
54
  ```bash
55
- pip install transformers torch torchaudio peft
56
  ```
57
 
58
  ### Usage
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  ```python
61
  from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
62
  import torch
@@ -80,7 +156,7 @@ input_features = processor(waveform.squeeze(), sampling_rate=16000, return_tenso
80
 
81
  # Generate transcription
82
  with torch.no_grad():
83
- predicted_ids = model.generate(input_features, language="fa", task="transcribe")
84
 
85
  # Decode the transcription
86
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
@@ -90,6 +166,14 @@ print(f"Transcription: {transcription}")
90
  ### Batch Processing
91
 
92
  ```python
 
 
 
 
 
 
 
 
93
  # For processing multiple audio files
94
  def transcribe_persian_audio(audio_paths):
95
  transcriptions = []
@@ -104,7 +188,7 @@ def transcribe_persian_audio(audio_paths):
104
  input_features = processor(waveform.squeeze(), sampling_rate=16000, return_tensors="pt").input_features
105
 
106
  with torch.no_grad():
107
- predicted_ids = model.generate(input_features, language="fa", task="transcribe")
108
 
109
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
110
  transcriptions.append(transcription)
@@ -176,12 +260,12 @@ The model has been evaluated on Persian speech recognition benchmarks and shows
176
  If you use this model in your research or applications, please cite:
177
 
178
  ```bibtex
179
- @misc{whisper-persian-yasinkeykh,
180
  author = {Yasin Keykh},
181
  title = {Whisper Persian Fine-tuned Model},
182
  year = {2024},
183
  publisher = {Hugging Face},
184
- url = {https://huggingface.co/yasinkeykh/whisper-persian-base}
185
  }
186
  ```
187
 
 
1
+ ---
2
+ language:
3
+ - fa
4
+ base_model: openai/whisper-base
5
+ tags:
6
+ - whisper
7
+ - speech
8
+ - persian
9
+ - farsi
10
+ - speech-to-text
11
+ - audio
12
+ - automatic-speech-recognition
13
+ - peft
14
+ - lora
15
+ library_name: transformers
16
+ license: apache-2.0
17
+ model-index:
18
+ - name: whisper-persian
19
+ results: []
20
+ pipeline_tag: automatic-speech-recognition
21
+ widget:
22
+ - example_title: Persian Speech Recognition
23
+ src: https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/resolve/main/audio/fa/common_voice_fa_18904283.mp3
24
+ datasets:
25
+ - mozilla-foundation/common_voice_13_0
26
+ metrics:
27
+ - wer
28
+ - cer
29
+ ---
30
+
31
  # Whisper Persian Fine-tuned Model
32
 
33
+ A fine-tuned Whisper model optimized for Persian (Farsi) speech-to-text conversion using LoRA (Low-Rank Adaptation) technique. This model provides real-time speech recognition capabilities for Persian language with high accuracy.
34
 
35
  ## Model Details
36
 
 
75
  - Not suitable for noisy environments without proper audio preprocessing
76
  - May have reduced accuracy on dialects significantly different from the training data
77
 
78
+ ## Use in Transformers
79
+
80
+ ```python
81
+ from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
82
+
83
+ processor = AutoProcessor.from_pretrained("Paulwalker4884/whisper-persian")
84
+ model = AutoModelForSpeechSeq2Seq.from_pretrained("Paulwalker4884/whisper-persian")
85
+ ```
86
+
87
  ## How to Get Started with the Model
88
 
89
  ### Installation
 
91
  First, install the required dependencies:
92
 
93
  ```bash
94
+ pip install transformers torch torchaudio numpy sounddevice
95
  ```
96
 
97
  ### Usage
98
 
99
+ #### Real-time Audio Recording and Transcription
100
+
101
+ ```python
102
+ import numpy as np
103
+ import sounddevice as sd
104
+ from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
105
+ import torch
106
+
107
+ # Load the fine-tuned Persian model
108
+ processor = AutoProcessor.from_pretrained("Paulwalker4884/whisper-persian")
109
+ model = AutoModelForSpeechSeq2Seq.from_pretrained("Paulwalker4884/whisper-persian").to("cpu")
110
+
111
+ # Record audio
112
+ duration = 5 # seconds
113
+ sample_rate = 16000
114
+
115
+ print("شروع ضبط...")
116
+ audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
117
+ sd.wait()
118
+ print("پایان ضبط.")
119
+
120
+ # Convert to 1D array
121
+ audio = np.squeeze(audio)
122
+
123
+ # Process audio
124
+ input_features = processor(audio, sampling_rate=sample_rate, return_tensors="pt").input_features
125
+
126
+ # Generate transcription
127
+ predicted_ids = model.generate(input_features)
128
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
129
+
130
+ print("متن شناسایی شده:")
131
+ print(transcription)
132
+ ```
133
+
134
+ #### Audio File Transcription
135
+
136
  ```python
137
  from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
138
  import torch
 
156
 
157
  # Generate transcription
158
  with torch.no_grad():
159
+ predicted_ids = model.generate(input_features)
160
 
161
  # Decode the transcription
162
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
 
166
  ### Batch Processing
167
 
168
  ```python
169
+ from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
170
+ import torch
171
+ import torchaudio
172
+
173
+ # Load the model and processor
174
+ processor = AutoProcessor.from_pretrained("Paulwalker4884/whisper-persian")
175
+ model = AutoModelForSpeechSeq2Seq.from_pretrained("Paulwalker4884/whisper-persian")
176
+
177
  # For processing multiple audio files
178
  def transcribe_persian_audio(audio_paths):
179
  transcriptions = []
 
188
  input_features = processor(waveform.squeeze(), sampling_rate=16000, return_tensors="pt").input_features
189
 
190
  with torch.no_grad():
191
+ predicted_ids = model.generate(input_features)
192
 
193
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
194
  transcriptions.append(transcription)
 
260
  If you use this model in your research or applications, please cite:
261
 
262
  ```bibtex
263
+ @misc{whisper-persian-paulwalker4884,
264
  author = {Yasin Keykh},
265
  title = {Whisper Persian Fine-tuned Model},
266
  year = {2024},
267
  publisher = {Hugging Face},
268
+ url = {https://huggingface.co/Paulwalker4884/whisper-persian}
269
  }
270
  ```
271