Paulwalker4884
/

whisper-persian

@@ -1,6 +1,36 @@
 # Whisper Persian Fine-tuned Model
-A fine-tuned Whisper model optimized for Persian (Farsi) speech-to-text conversion using LoRA (Low-Rank Adaptation) technique.
 ## Model Details
@@ -45,6 +75,15 @@ The model can be integrated into larger applications such as:
 - Not suitable for noisy environments without proper audio preprocessing
 - May have reduced accuracy on dialects significantly different from the training data
 ## How to Get Started with the Model
 ### Installation
@@ -52,11 +91,48 @@ The model can be integrated into larger applications such as:
 First, install the required dependencies:
 ```bash
-pip install transformers torch torchaudio peft
 ```
 ### Usage
 ```python
 from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
 import torch
@@ -80,7 +156,7 @@ input_features = processor(waveform.squeeze(), sampling_rate=16000, return_tenso
 # Generate transcription
 with torch.no_grad():
-    predicted_ids = model.generate(input_features, language="fa", task="transcribe")
 # Decode the transcription
 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
@@ -90,6 +166,14 @@ print(f"Transcription: {transcription}")
 ### Batch Processing
 ```python
 # For processing multiple audio files
 def transcribe_persian_audio(audio_paths):
     transcriptions = []
@@ -104,7 +188,7 @@ def transcribe_persian_audio(audio_paths):
         input_features = processor(waveform.squeeze(), sampling_rate=16000, return_tensors="pt").input_features
         with torch.no_grad():
-            predicted_ids = model.generate(input_features, language="fa", task="transcribe")
         transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
         transcriptions.append(transcription)
@@ -176,12 +260,12 @@ The model has been evaluated on Persian speech recognition benchmarks and shows
 If you use this model in your research or applications, please cite:
 ```bibtex
-@misc{whisper-persian-yasinkeykh,
   author = {Yasin Keykh},
   title = {Whisper Persian Fine-tuned Model},
   year = {2024},
   publisher = {Hugging Face},
-  url = {https://huggingface.co/yasinkeykh/whisper-persian-base}
 }
 ```

+---
+language:
+- fa
+base_model: openai/whisper-base
+tags:
+- whisper
+- speech
+- persian
+- farsi
+- speech-to-text
+- audio
+- automatic-speech-recognition
+- peft
+- lora
+library_name: transformers
+license: apache-2.0
+model-index:
+- name: whisper-persian
+  results: []
+pipeline_tag: automatic-speech-recognition
+widget:
+- example_title: Persian Speech Recognition
+  src: https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0/resolve/main/audio/fa/common_voice_fa_18904283.mp3
+datasets:
+- mozilla-foundation/common_voice_13_0
+metrics:
+- wer
+- cer
+---
 # Whisper Persian Fine-tuned Model
+A fine-tuned Whisper model optimized for Persian (Farsi) speech-to-text conversion using LoRA (Low-Rank Adaptation) technique. This model provides real-time speech recognition capabilities for Persian language with high accuracy.
 ## Model Details
 - Not suitable for noisy environments without proper audio preprocessing
 - May have reduced accuracy on dialects significantly different from the training data
+## Use in Transformers
+```python
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+processor = AutoProcessor.from_pretrained("Paulwalker4884/whisper-persian")
+model = AutoModelForSpeechSeq2Seq.from_pretrained("Paulwalker4884/whisper-persian")
+```
 ## How to Get Started with the Model
 ### Installation
 First, install the required dependencies:
 ```bash
+pip install transformers torch torchaudio numpy sounddevice
 ```
 ### Usage
+#### Real-time Audio Recording and Transcription
+```python
+import numpy as np
+import sounddevice as sd
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+import torch
+# Load the fine-tuned Persian model
+processor = AutoProcessor.from_pretrained("Paulwalker4884/whisper-persian")
+model = AutoModelForSpeechSeq2Seq.from_pretrained("Paulwalker4884/whisper-persian").to("cpu")
+# Record audio
+duration = 5  # seconds
+sample_rate = 16000
+print("شروع ضبط...")
+audio = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1)
+sd.wait()
+print("پایان ضبط.")
+# Convert to 1D array
+audio = np.squeeze(audio)
+# Process audio
+input_features = processor(audio, sampling_rate=sample_rate, return_tensors="pt").input_features
+# Generate transcription
+predicted_ids = model.generate(input_features)
+transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
+print("متن شناسایی شده:")
+print(transcription)
+```
+#### Audio File Transcription
 ```python
 from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
 import torch
 # Generate transcription
 with torch.no_grad():
+    predicted_ids = model.generate(input_features)
 # Decode the transcription
 transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
 ### Batch Processing
 ```python
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+import torch
+import torchaudio
+# Load the model and processor
+processor = AutoProcessor.from_pretrained("Paulwalker4884/whisper-persian")
+model = AutoModelForSpeechSeq2Seq.from_pretrained("Paulwalker4884/whisper-persian")
 # For processing multiple audio files
 def transcribe_persian_audio(audio_paths):
     transcriptions = []
         input_features = processor(waveform.squeeze(), sampling_rate=16000, return_tensors="pt").input_features
         with torch.no_grad():
+            predicted_ids = model.generate(input_features)
         transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
         transcriptions.append(transcription)
 If you use this model in your research or applications, please cite:
 ```bibtex
+@misc{whisper-persian-paulwalker4884,
   author = {Yasin Keykh},
   title = {Whisper Persian Fine-tuned Model},
   year = {2024},
   publisher = {Hugging Face},
+  url = {https://huggingface.co/Paulwalker4884/whisper-persian}
 }
 ```