AdhamAshraf commited on
Commit
5e823b7
ยท
verified ยท
1 Parent(s): 9022d7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +230 -3
README.md CHANGED
@@ -1,13 +1,240 @@
1
  ---
2
  title: SlangGPT
3
- emoji: ๐Ÿข
4
  colorFrom: green
5
  colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 6.14.0
8
- python_version: '3.13'
9
  app_file: app.py
10
  pinned: false
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: SlangGPT
3
+ emoji: ๐ŸŒ
4
  colorFrom: green
5
  colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 6.14.0
8
+ python_version: '3.10'
9
  app_file: app.py
10
  pinned: false
11
+ license: mit
12
+ short_description: Egyptian Arabic slang โ†’ Modern Standard Arabic translation
13
  ---
14
 
15
+ # SlangGPT โ€“ Egyptian Arabic โ†’ Modern Standard Arabic
16
+
17
+ > โšก Real-time Egyptian Arabic slang translation powered by AraGPT-2.
18
+
19
+ [![GitHub Repository](https://img.shields.io/badge/GitHub-SlangGPT-181717?logo=github)](https://github.com/adhamashraf7788/SlangGPT)
20
+ [![๐Ÿค— Model](https://img.shields.io/badge/๐Ÿค—%20Model-SlangGPT-blue)](https://huggingface.co/AdhamAshraf/SlangGPT)
21
+ [![๐Ÿค— Dataset](https://img.shields.io/badge/๐Ÿค—%20Dataset-Egyptian%20Arabic%20โ†”%20MSA-orange)](https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic)
22
+ [![๐Ÿค— Spaces](https://img.shields.io/badge/๐Ÿค—%20Spaces-Live%20Demo-yellow)](https://huggingface.co/spaces/AdhamAshraf/SlangGPT)
23
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
24
+
25
+ ---
26
+
27
+ ## ๐Ÿง  About the Project
28
+
29
+ **SlangGPT** is a fine-tuned **AraGPT-2** model designed to translate **Egyptian Arabic slang/dialect** into **Modern Standard Arabic (MSA)**.
30
+
31
+ The project also includes:
32
+
33
+ - โœ… A translation verification (detection) model
34
+ - โญ Human feedback collection
35
+ - ๐Ÿ“Š Public research datasets
36
+ - ๐Ÿค– RLHF-ready feedback pipeline
37
+
38
+ ๐Ÿ‘‰ **Type an Egyptian Arabic sentence below and get the MSA translation instantly!**
39
+
40
+ ---
41
+
42
+ ## โœจ Features
43
+
44
+ - ๐Ÿ‡ช๐Ÿ‡ฌ Egyptian Arabic slang understanding
45
+ - ๐Ÿ“˜ Translation into Modern Standard Arabic (MSA)
46
+ - ๐Ÿค– Fine-tuned AraGPT-2 language model
47
+ - ๐Ÿง  Translation verification / detection model
48
+ - โญ Human feedback collection pipeline
49
+ - ๐Ÿ“Š Public feedback dataset for research
50
+ - ๐ŸŒ Interactive Gradio interface
51
+
52
+ ---
53
+
54
+ ## ๐Ÿ’ก Example Inputs
55
+
56
+ Try these examples:
57
+
58
+ - `ุนุงู…ู„ ุงูŠู‡ุŸ`
59
+ - `ุฅูŠู‡ ุงู„ุฃุฎุจุงุฑุŸ`
60
+ - `ู‡ูˆ ุงู†ุช ุฑุงูŠุญ ููŠู†ุŸ`
61
+ - `ุนุงูŠุฒ ุฃุฑูˆุญ ุงู„ุจูŠุช`
62
+ - `ุฃู†ุง ุฒู‡ู‚ุงู† ุฌุฏู‹ุง`
63
+ - `ุงู„ุฏู†ูŠุง ุญุฑ ุงู„ู†ู‡ุงุฑุฏุฉ`
64
+
65
+ ---
66
+
67
+ ## ๐Ÿ“„ Full Project Report
68
+
69
+ For all technical details โ€” architecture, training, hyperparameters, evaluation, error analysis, and comparison with Stanford CS224N baselines โ€” read the full report:
70
+
71
+ ๐Ÿ“„ https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf
72
+
73
+ ---
74
+
75
+ ## ๐Ÿง  How It Works
76
+
77
+ The model expects the following prompt format:
78
+
79
+ ```text
80
+ dialect: {your sentence} โ†” msa:
81
+ ```
82
+
83
+ The model then autoregressively generates the corresponding MSA translation.
84
+
85
+ ### Decoding Strategy
86
+
87
+ - Temperature = 0.7
88
+ - Top-k = 50
89
+ - Top-p = 0.92
90
+ - Repetition penalty = 1.3
91
+
92
+ These settings improve fluency while reducing repetitive outputs.
93
+
94
+ ---
95
+
96
+ ## ๐Ÿ“ Feedback System
97
+
98
+ After each translation, users can provide feedback:
99
+
100
+ 1. โœ… Is the translation correct? (Yes / No)
101
+ 2. โœ๏ธ Provide a corrected MSA translation (optional)
102
+ 3. โญ Rate translation quality (1โ€“5)
103
+
104
+ All feedback is stored in the public dataset:
105
+
106
+ ๐Ÿ”— https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset
107
+
108
+ Collected feedback will help improve future SlangGPT versions and support Arabic RLHF research.
109
+
110
+ ---
111
+
112
+ ## ๐Ÿ“Š Model Performance
113
+
114
+ ### Generation Quality
115
+
116
+ | Metric | Zero-shot (Base AraGPT-2) | SlangGPT |
117
+ |---|---|---|
118
+ | chrF | 10.62 | **29.08** |
119
+ | BLEU | 0.02 | **6.63** |
120
+
121
+ ### Detection Model
122
+
123
+ | Task | Accuracy |
124
+ |---|---|
125
+ | Translation Verification | **0.956** |
126
+
127
+ ### Improvements
128
+
129
+ - ๐Ÿ“ˆ chrF improvement: **+18.46**
130
+ - ๐Ÿ“ˆ Detection accuracy improvement: **+45.6 points**
131
+
132
+ ---
133
+
134
+ ## ๐Ÿ”ฌ Research Contributions
135
+
136
+ This project contributes:
137
+
138
+ - A fine-tuned Egyptian Arabic โ†’ MSA generation model
139
+ - A translation verification classifier
140
+ - A public human-feedback dataset
141
+ - An RLHF-ready Arabic NLP pipeline
142
+
143
+ The project aims to support future Arabic dialect NLP research and low-resource language modeling.
144
+
145
+ ---
146
+
147
+ ## โš ๏ธ Limitations
148
+
149
+ The model may struggle with:
150
+
151
+ - Rare slang expressions
152
+ - Mixed Arabic-English text
153
+ - Heavy sarcasm or idioms
154
+ - Long conversational context
155
+
156
+ Translations are generated probabilistically and may occasionally contain inaccuracies.
157
+
158
+ ---
159
+
160
+ ## ๐Ÿ” Feedback & Privacy
161
+
162
+ Submitted feedback may be stored publicly in the research feedback dataset.
163
+
164
+ Please avoid submitting:
165
+
166
+ - Personal information
167
+ - Phone numbers
168
+ - Addresses
169
+ - Sensitive/private content
170
+
171
+ ---
172
+
173
+ ## ๐Ÿš€ Future Work
174
+
175
+ Planned future improvements include:
176
+
177
+ - Larger instruction-tuned Arabic models
178
+ - RLHF fine-tuning using collected feedback
179
+ - Better dialect generalization
180
+ - Arabic-English code-switching support
181
+ - Faster inference optimization
182
+
183
+ ---
184
+
185
+ ## ๐Ÿ—๏ธ Technical Stack
186
+
187
+ - Transformers ๐Ÿค—
188
+ - PyTorch
189
+ - Gradio
190
+ - Hugging Face Spaces
191
+ - AraGPT-2
192
+ - pandas
193
+ - scikit-learn
194
+
195
+ ---
196
+
197
+ ## ๐Ÿ“š Resources
198
+
199
+ | Resource | Link |
200
+ |---|---|
201
+ | Live Space | https://huggingface.co/spaces/AdhamAshraf/SlangGPT |
202
+ | Model on HF Hub | https://huggingface.co/AdhamAshraf/SlangGPT |
203
+ | Dataset (Egyptian โ†” MSA) | https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic |
204
+ | Feedback Dataset | https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset |
205
+ | GitHub Repository | https://github.com/adhamashraf7788/SlangGPT |
206
+ | Full Report | https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf |
207
+
208
+ ---
209
+
210
+ ## ๐Ÿ™ Acknowledgements
211
+
212
+ - AraGPT-2 by Antoun et al. (2021)
213
+ - Stanford CS224N educational framework
214
+ - The Arabic NLP open-source community
215
+ - All users who provide feedback to improve SlangGPT
216
+
217
+ ---
218
+
219
+ ## โญ Support the Project
220
+
221
+ If you find SlangGPT useful:
222
+
223
+ - โญ Star the GitHub repository
224
+ - ๐Ÿค Contribute improvements
225
+ - ๐Ÿ“ Submit feedback
226
+ - ๐Ÿ“ข Share the project
227
+
228
+ ---
229
+
230
+ ## ๐Ÿ“œ License
231
+
232
+ This Space, model, and datasets are released under the **MIT License**.
233
+
234
+ Free for academic and commercial use with attribution.
235
+
236
+ ---
237
+
238
+ ## ๐Ÿš€ Enjoy Translating!
239
+
240
+ Thank you for using SlangGPT and helping improve Arabic NLP research.