YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Card for Moroccan Dialect Speech-to-Text Model
This model is designed to transcribe speech in the Moroccan dialect to text. It's built on top of the Wav2Vec 2.0 architecture, fine-tuned on a dataset of Moroccan dialect speech.
Model Details
Model Description
This model is part of a project aimed at improving speech recognition technology for underrepresented languages, with a focus on the Moroccan Arabic dialect. The model leverages the power of the Wav2Vec2 architecture, fine-tuned on a curated dataset of Moroccan speech.
- Developed by: https://www.kaggle.com/khaireddinedalaa
- Model type: Wav2Vec2ForCTC
- Language(s) (NLP): Moroccan Arabic (Darija)
- License: Apache 2.0
- Finetuned from model: jonatasgrosman/wav2vec2-large-xlsr-53-arabic
Model Sources
- Demo: Coming Soon
Uses
Direct Use
This model is intended for direct use in applications requiring speech-to-text capabilities for the Moroccan dialect. It can be integrated into services like voice-controlled assistants, dictation software, or for generating subtitles in real-time.
Out-of-Scope Use
This model is not intended for use with languages other than Moroccan Arabic or for non-speech audio transcription. Performance may significantly decrease when used out of context.
Bias, Risks, and Limitations
The model may exhibit biases present in the training data. It's important to note that dialectal variations within Morocco could affect transcription accuracy. Users should be aware of these limitations and consider additional validation for critical applications.
Recommendations
Continual monitoring and updating of the model with more diverse datasets can help mitigate biases and improve performance across different dialects and speaking styles.
How to Get Started with the Model
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from transformers import pipeline
import soundfile as sf
# Load the model and processor
processor = Wav2Vec2Processor.from_pretrained("Vrspi/SpeechToText")
model = Wav2Vec2ForCTC.from_pretrained("Vrspi/SpeechToText")
# Create a speech-to-text pipeline
speech_recognizer = pipeline("automatic-speech-recognition", model=model, processor=processor)
# Load an audio file
speech, sampling_rate = sf.read("path_to_your_audio_file.wav")
# Transcribe the speech
transcription = speech_recognizer(speech, sampling_rate=sampling_rate)
print(transcription)
Training Details
Training Data
The model was trained on a dataset comprising approximately 20 hours of spoken Moroccan Arabic collected from various sources, including public speeches, conversations, and media content.
Training Procedure
Preprocessing
The audio files were resampled to 16kHz and trimmed to remove silence. Noisy segments were manually annotated and excluded from training.
Training Hyperparameters
- Training regime: Training was performed using the AdamW optimizer with a learning rate of 3e-5, over 3 epochs.
Evaluation
Results
The model is not tested yet , I will drop results as soon as possible
Environmental Impact
- Hardware Type: Training was performed on Kaggle's GPU environment.
- Hours used: Approximately 10 hours.
- Downloads last month
- 3