metadata
language:
- ar
metrics:
- wer
base_model:
- openai/whisper-medium
pipeline_tag: automatic-speech-recognition
tags:
- whisper
- arabic
- pytorch
license: apache-2.0
WhisperLevantineArabic
Fine-tuned Whisper model for the Levantine Dialect (Israeli-Arabic)
Model Description
This model is a fine-tuned version of Whisper Medium tailored specifically for transcribing Levantine Arabic, focusing on the Israeli dialect. It is designed to improve automatic speech recognition (ASR) performance for this particular variant of Arabic.
- Base Model: Whisper Large V3
- Fine-tuned for: Levantine Arabic (Israeli Dialect)
- WER on test set: 35%
Training Data
The dataset used for training and fine-tuning this model consists of approximately 2,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:
- Self-maintained Collection: 2,000 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.
- Total Dataset Size: ~1,200 hours
- Sampling Rate: 8kHz - upsampled to 16kHz
- Annotation: Human-transcribed and annotated for high accuracy.
How to Use
The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results. You can load the model as follows:
import faster_whisper
import librosa
with torch.no_grad():
audio_data, sample_rate = librosa.load(audio_file)
audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
segs, _ = model.transcribe(audio_data, language='ar')
transcript = ' '.join(s.text for s in segs)