SpeechToText / README.md
Vrspi's picture
Create README.md
84486d5 verified

Model Card for Moroccan Dialect Speech-to-Text Model

This model is designed to transcribe speech in the Moroccan dialect to text. It's built on top of the Wav2Vec 2.0 architecture, fine-tuned on a dataset of Moroccan dialect speech.

Model Details

Model Description

This model is part of a project aimed at improving speech recognition technology for underrepresented languages, with a focus on the Moroccan Arabic dialect. The model leverages the power of the Wav2Vec2 architecture, fine-tuned on a curated dataset of Moroccan speech.

  • Developed by: https://www.kaggle.com/khaireddinedalaa
  • Model type: Wav2Vec2ForCTC
  • Language(s) (NLP): Moroccan Arabic (Darija)
  • License: Apache 2.0
  • Finetuned from model: jonatasgrosman/wav2vec2-large-xlsr-53-arabic

Model Sources

  • Demo: Coming Soon

Uses

Direct Use

This model is intended for direct use in applications requiring speech-to-text capabilities for the Moroccan dialect. It can be integrated into services like voice-controlled assistants, dictation software, or for generating subtitles in real-time.

Out-of-Scope Use

This model is not intended for use with languages other than Moroccan Arabic or for non-speech audio transcription. Performance may significantly decrease when used out of context.

Bias, Risks, and Limitations

The model may exhibit biases present in the training data. It's important to note that dialectal variations within Morocco could affect transcription accuracy. Users should be aware of these limitations and consider additional validation for critical applications.

Recommendations

Continual monitoring and updating of the model with more diverse datasets can help mitigate biases and improve performance across different dialects and speaking styles.

How to Get Started with the Model

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from transformers import pipeline
import soundfile as sf

# Load the model and processor
processor = Wav2Vec2Processor.from_pretrained("Vrspi/SpeechToText")
model = Wav2Vec2ForCTC.from_pretrained("Vrspi/SpeechToText")

# Create a speech-to-text pipeline
speech_recognizer = pipeline("automatic-speech-recognition", model=model, processor=processor)

# Load an audio file
speech, sampling_rate = sf.read("path_to_your_audio_file.wav")

# Transcribe the speech
transcription = speech_recognizer(speech, sampling_rate=sampling_rate)
print(transcription)

Training Details

Training Data

The model was trained on a dataset comprising approximately 20 hours of spoken Moroccan Arabic collected from various sources, including public speeches, conversations, and media content.

Training Procedure

Preprocessing

The audio files were resampled to 16kHz and trimmed to remove silence. Noisy segments were manually annotated and excluded from training.

Training Hyperparameters

  • Training regime: Training was performed using the AdamW optimizer with a learning rate of 3e-5, over 3 epochs.

Evaluation

Results

The model is not tested yet , I will drop results as soon as possible

Environmental Impact

  • Hardware Type: Training was performed on Kaggle's GPU environment.
  • Hours used: Approximately 10 hours.