carmi commited on
Commit
06a313e
·
verified ·
1 Parent(s): 125f9cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -23
README.md CHANGED
@@ -20,22 +20,18 @@ license: apache-2.0
20
 
21
  This model is a fine-tuned version of [Whisper Medium](https://github.com/openai/whisper) tailored specifically for transcribing Levantine Arabic, focusing on the Israeli dialect. It is designed to improve automatic speech recognition (ASR) performance for this particular variant of Arabic.
22
 
23
- - **Base Model**: Whisper Medium
24
  - **Fine-tuned for**: Levantine Arabic (Israeli Dialect)
25
- - **WER on test set**: 14%
26
 
27
  ## Training Data
28
 
29
  The dataset used for training and fine-tuning this model consists of approximately 2,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:
30
 
31
  1. **Self-maintained Collection**: 2,000 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.
32
- 2. **[MGB-2 Corpus (Filtered)](https://huggingface.co/datasets/BelalElhossany/mgb2_audios_transcriptions_preprocessed)**: 200 hours of broadcast media in Arabic.
33
- 3. **[CommonVoice18 (Filtered)](https://huggingface.co/datasets/fsicoli/common_voice_18_0)**: A filtered portion of the CommonVoice18 dataset.
34
 
35
- Filtering was applied using the [AlcLaM](https://arxiv.org/abs/2407.13097) Arabic language model to ensure relevance to Levantine Arabic.
36
-
37
- - **Total Dataset Size**: ~2,200 hours
38
- - **Sampling Rate**: 16kHz
39
  - **Annotation**: Human-transcribed and annotated for high accuracy.
40
 
41
  ## How to Use
@@ -43,20 +39,10 @@ Filtering was applied using the [AlcLaM](https://arxiv.org/abs/2407.13097) Arabi
43
  The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results. You can load the model as follows:
44
 
45
  ```python
46
- from transformers import WhisperProcessor, WhisperForConditionalGeneration
47
- import torch
48
-
49
- # Load the model and processor
50
- processor = WhisperProcessor.from_pretrained("HebArabNlpProject/whisperLevantine")
51
- model = WhisperForConditionalGeneration.from_pretrained("HebArabNlpProject/whisperLevantine").to("cuda" if torch.cuda.is_available() else "cpu")
52
-
53
- # Example usage: processing audio input
54
- file_path = ... # wav filepath goes here
55
- audio_input, samplerate = torchaudio.load(file_path)
56
- inputs = processor(audio_input.squeeze(), return_tensors="pt", sampling_rate=samplerate).to("cuda" if torch.cuda.is_available() else "cpu")
57
 
58
- # Run inference
59
  with torch.no_grad():
60
- generated_ids = model.generate(inputs["input_features"])
61
- transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
62
- print(transcription[0])
 
 
20
 
21
  This model is a fine-tuned version of [Whisper Medium](https://github.com/openai/whisper) tailored specifically for transcribing Levantine Arabic, focusing on the Israeli dialect. It is designed to improve automatic speech recognition (ASR) performance for this particular variant of Arabic.
22
 
23
+ - **Base Model**: Whisper Large V3
24
  - **Fine-tuned for**: Levantine Arabic (Israeli Dialect)
25
+ - **WER on test set**: 35%
26
 
27
  ## Training Data
28
 
29
  The dataset used for training and fine-tuning this model consists of approximately 2,200 hours of transcribed audio, primarily featuring Israeli Levantine Arabic, along with some general Levantine Arabic content. The data sources include:
30
 
31
  1. **Self-maintained Collection**: 2,000 hours of audio data curated by the team, covering a wide range of Israeli Levantine Arabic speech.
 
 
32
 
33
+ - **Total Dataset Size**: ~1,200 hours
34
+ - **Sampling Rate**: 8kHz - upsampled to 16kHz
 
 
35
  - **Annotation**: Human-transcribed and annotated for high accuracy.
36
 
37
  ## How to Use
 
39
  The model is compatible with 16kHz audio input. Ensure your files are at the same sample rate for optimal results. You can load the model as follows:
40
 
41
  ```python
42
+ import faster_whisper
 
 
 
 
 
 
 
 
 
 
43
 
 
44
  with torch.no_grad():
45
+ audio_data, sample_rate = librosa.load(audio_file)
46
+ audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
47
+ segs, _ = model.transcribe(audio_data, language='ar')
48
+ transcript = ' '.join(s.text for s in segs)