Vrspi commited on
Commit
84486d5
·
verified ·
1 Parent(s): 45f2356

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for Moroccan Dialect Speech-to-Text Model
2
+
3
+ This model is designed to transcribe speech in the Moroccan dialect to text. It's built on top of the Wav2Vec 2.0 architecture, fine-tuned on a dataset of Moroccan dialect speech.
4
+
5
+ ## Model Details
6
+
7
+ ### Model Description
8
+
9
+ This model is part of a project aimed at improving speech recognition technology for underrepresented languages, with a focus on the Moroccan Arabic dialect. The model leverages the power of the Wav2Vec2 architecture, fine-tuned on a curated dataset of Moroccan speech.
10
+
11
+ - **Developed by:** https://www.kaggle.com/khaireddinedalaa
12
+ - **Model type:** Wav2Vec2ForCTC
13
+ - **Language(s) (NLP):** Moroccan Arabic (Darija)
14
+ - **License:** Apache 2.0
15
+ - **Finetuned from model:** jonatasgrosman/wav2vec2-large-xlsr-53-arabic
16
+
17
+ ### Model Sources
18
+
19
+ - **Demo:** Coming Soon
20
+
21
+ ## Uses
22
+
23
+ ### Direct Use
24
+
25
+ This model is intended for direct use in applications requiring speech-to-text capabilities for the Moroccan dialect. It can be integrated into services like voice-controlled assistants, dictation software, or for generating subtitles in real-time.
26
+
27
+ ### Out-of-Scope Use
28
+
29
+ This model is not intended for use with languages other than Moroccan Arabic or for non-speech audio transcription. Performance may significantly decrease when used out of context.
30
+
31
+ ## Bias, Risks, and Limitations
32
+
33
+ The model may exhibit biases present in the training data. It's important to note that dialectal variations within Morocco could affect transcription accuracy. Users should be aware of these limitations and consider additional validation for critical applications.
34
+
35
+ ### Recommendations
36
+
37
+ Continual monitoring and updating of the model with more diverse datasets can help mitigate biases and improve performance across different dialects and speaking styles.
38
+
39
+ ## How to Get Started with the Model
40
+
41
+ ```python
42
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
43
+ from transformers import pipeline
44
+ import soundfile as sf
45
+
46
+ # Load the model and processor
47
+ processor = Wav2Vec2Processor.from_pretrained("Vrspi/SpeechToText")
48
+ model = Wav2Vec2ForCTC.from_pretrained("Vrspi/SpeechToText")
49
+
50
+ # Create a speech-to-text pipeline
51
+ speech_recognizer = pipeline("automatic-speech-recognition", model=model, processor=processor)
52
+
53
+ # Load an audio file
54
+ speech, sampling_rate = sf.read("path_to_your_audio_file.wav")
55
+
56
+ # Transcribe the speech
57
+ transcription = speech_recognizer(speech, sampling_rate=sampling_rate)
58
+ print(transcription)
59
+ ```
60
+
61
+ ## Training Details
62
+
63
+ ### Training Data
64
+
65
+ The model was trained on a dataset comprising approximately 20 hours of spoken Moroccan Arabic collected from various sources, including public speeches, conversations, and media content.
66
+
67
+ ### Training Procedure
68
+
69
+ #### Preprocessing
70
+
71
+ The audio files were resampled to 16kHz and trimmed to remove silence. Noisy segments were manually annotated and excluded from training.
72
+
73
+ #### Training Hyperparameters
74
+
75
+ - **Training regime:** Training was performed using the AdamW optimizer with a learning rate of 3e-5, over 3 epochs.
76
+
77
+ ## Evaluation
78
+
79
+ ### Results
80
+
81
+ The model is not tested yet , I will drop results as soon as possible
82
+
83
+ ## Environmental Impact
84
+
85
+ - **Hardware Type:** Training was performed on Kaggle's GPU environment.
86
+ - **Hours used:** Approximately 10 hours.
87
+ ---