Vrspi
/

SpeechToText

+# Model Card for Moroccan Dialect Speech-to-Text Model
+This model is designed to transcribe speech in the Moroccan dialect to text. It's built on top of the Wav2Vec 2.0 architecture, fine-tuned on a dataset of Moroccan dialect speech.
+## Model Details
+### Model Description
+This model is part of a project aimed at improving speech recognition technology for underrepresented languages, with a focus on the Moroccan Arabic dialect. The model leverages the power of the Wav2Vec2 architecture, fine-tuned on a curated dataset of Moroccan speech.
+- **Developed by:** https://www.kaggle.com/khaireddinedalaa
+- **Model type:** Wav2Vec2ForCTC
+- **Language(s) (NLP):** Moroccan Arabic (Darija)
+- **License:** Apache 2.0
+- **Finetuned from model:** jonatasgrosman/wav2vec2-large-xlsr-53-arabic
+### Model Sources
+- **Demo:** Coming Soon
+## Uses
+### Direct Use
+This model is intended for direct use in applications requiring speech-to-text capabilities for the Moroccan dialect. It can be integrated into services like voice-controlled assistants, dictation software, or for generating subtitles in real-time.
+### Out-of-Scope Use
+This model is not intended for use with languages other than Moroccan Arabic or for non-speech audio transcription. Performance may significantly decrease when used out of context.
+## Bias, Risks, and Limitations
+The model may exhibit biases present in the training data. It's important to note that dialectal variations within Morocco could affect transcription accuracy. Users should be aware of these limitations and consider additional validation for critical applications.
+### Recommendations
+Continual monitoring and updating of the model with more diverse datasets can help mitigate biases and improve performance across different dialects and speaking styles.
+## How to Get Started with the Model
+```python
+from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
+from transformers import pipeline
+import soundfile as sf
+# Load the model and processor
+processor = Wav2Vec2Processor.from_pretrained("Vrspi/SpeechToText")
+model = Wav2Vec2ForCTC.from_pretrained("Vrspi/SpeechToText")
+# Create a speech-to-text pipeline
+speech_recognizer = pipeline("automatic-speech-recognition", model=model, processor=processor)
+# Load an audio file
+speech, sampling_rate = sf.read("path_to_your_audio_file.wav")
+# Transcribe the speech
+transcription = speech_recognizer(speech, sampling_rate=sampling_rate)
+print(transcription)
+```
+## Training Details
+### Training Data
+The model was trained on a dataset comprising approximately 20 hours of spoken Moroccan Arabic collected from various sources, including public speeches, conversations, and media content.
+### Training Procedure
+#### Preprocessing
+The audio files were resampled to 16kHz and trimmed to remove silence. Noisy segments were manually annotated and excluded from training.
+#### Training Hyperparameters
+- **Training regime:** Training was performed using the AdamW optimizer with a learning rate of 3e-5, over 3 epochs.
+## Evaluation
+### Results
+The model is not tested yet , I will drop results as soon as possible
+## Environmental Impact
+- **Hardware Type:** Training was performed on Kaggle's GPU environment.
+- **Hours used:** Approximately 10 hours.
+---