HebArabNlpProject
/

WhisperLevantine

Automatic Speech Recognition

Model card Files Files and versions

carmi commited on May 20, 2025

Commit

33d7b75

·

verified ·

1 Parent(s): db8ae18

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ This model is a fine-tuned version of [Whisper Medium](https://github.com/openai
 The dataset used for training and fine-tuning this model consists of approximately 2,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:
-1. **Self-maintained Collection**: 2,000 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.
 - **Total Dataset Size**: ~1,200 hours
 - **Sampling Rate**: 8kHz - upsampled to 16kHz
@@ -39,11 +39,18 @@ The dataset used for training and fine-tuning this model consists of approximate
 The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results. You can load the model as follows:
 ```python
 import faster_whisper
 import librosa
 with torch.no_grad():
     audio_data, sample_rate = librosa.load(audio_file)
     audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
-    segs, _ = model.transcribe(audio_data, language='ar')
-    transcript = ' '.join(s.text for s in segs)

 The dataset used for training and fine-tuning this model consists of approximately 2,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:
+1. **Self-maintained Collection**: 1,200 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.
 - **Total Dataset Size**: ~1,200 hours
 - **Sampling Rate**: 8kHz - upsampled to 16kHz
 The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results. You can load the model as follows:
 ```python
+pip install faster-whisper
 import faster_whisper
 import librosa
+model = faster_whisper.WhisperModel("model.bin")
+audio_file = 'your audio file.wav'
 with torch.no_grad():
     audio_data, sample_rate = librosa.load(audio_file)
     audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
+    segments, _ = model.transcribe(audio_data, language='ar')
+    for segment in segments:
+        for word in segment.words:
+            print("[%.2fs -> %.2fs] %s" % (word.start, word.end, word.word))
+    transcript = ' '.join(s.text for s in segments)