Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.15.0
title: SlangGPT
emoji: ๐
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 6.14.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
short_description: Egyptian Arabic slang โ Modern Standard Arabic translation
SlangGPT โ Egyptian Arabic โ Modern Standard Arabic
โก Real-time Egyptian Arabic slang translation powered by AraGPT-2.
๐ง About the Project
SlangGPT is a fine-tuned AraGPT-2 model designed to translate Egyptian Arabic slang/dialect into Modern Standard Arabic (MSA).
The project also includes:
- โ A translation verification (detection) model
- โญ Human feedback collection
- ๐ Public research datasets
- ๐ค RLHF-ready feedback pipeline
๐ Type an Egyptian Arabic sentence below and get the MSA translation instantly!
โจ Features
- ๐ช๐ฌ Egyptian Arabic slang understanding
- ๐ Translation into Modern Standard Arabic (MSA)
- ๐ค Fine-tuned AraGPT-2 language model
- ๐ง Translation verification / detection model
- โญ Human feedback collection pipeline
- ๐ Public feedback dataset for research
- ๐ Interactive Gradio interface
๐ก Example Inputs
Try these examples:
ุนุงู ู ุงููุุฅูู ุงูุฃุฎุจุงุฑุูู ุงูุช ุฑุงูุญ ูููุุนุงูุฒ ุฃุฑูุญ ุงูุจูุชุฃูุง ุฒููุงู ุฌุฏูุงุงูุฏููุง ุญุฑ ุงูููุงุฑุฏุฉ
๐ Full Project Report
For all technical details โ architecture, training, hyperparameters, evaluation, error analysis, and comparison with Stanford CS224N baselines โ read the full report:
๐ https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf
๐ง How It Works
The model expects the following prompt format:
dialect: {your sentence} โ msa:
The model then autoregressively generates the corresponding MSA translation.
Decoding Strategy
- Temperature = 0.7
- Top-k = 50
- Top-p = 0.92
- Repetition penalty = 1.3
These settings improve fluency while reducing repetitive outputs.
๐ Feedback System
After each translation, users can provide feedback:
- โ Is the translation correct? (Yes / No)
- โ๏ธ Provide a corrected MSA translation (optional)
- โญ Rate translation quality (1โ5)
All feedback is stored in the public dataset:
๐ https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset
Collected feedback will help improve future SlangGPT versions and support Arabic RLHF research.
๐ Model Performance
Generation Quality
| Metric | Zero-shot (Base AraGPT-2) | SlangGPT |
|---|---|---|
| chrF | 10.62 | 29.08 |
| BLEU | 0.02 | 6.63 |
Detection Model
| Task | Accuracy |
|---|---|
| Translation Verification | 0.956 |
Improvements
- ๐ chrF improvement: +18.46
- ๐ Detection accuracy improvement: +45.6 points
๐ฌ Research Contributions
This project contributes:
- A fine-tuned Egyptian Arabic โ MSA generation model
- A translation verification classifier
- A public human-feedback dataset
- An RLHF-ready Arabic NLP pipeline
The project aims to support future Arabic dialect NLP research and low-resource language modeling.
โ ๏ธ Limitations
The model may struggle with:
- Rare slang expressions
- Mixed Arabic-English text
- Heavy sarcasm or idioms
- Long conversational context
Translations are generated probabilistically and may occasionally contain inaccuracies.
๐ Feedback & Privacy
Submitted feedback may be stored publicly in the research feedback dataset.
Please avoid submitting:
- Personal information
- Phone numbers
- Addresses
- Sensitive/private content
๐ Future Work
Planned future improvements include:
- Larger instruction-tuned Arabic models
- RLHF fine-tuning using collected feedback
- Better dialect generalization
- Arabic-English code-switching support
- Faster inference optimization
๐๏ธ Technical Stack
- Transformers ๐ค
- PyTorch
- Gradio
- Hugging Face Spaces
- AraGPT-2
- pandas
- scikit-learn
๐ Resources
| Resource | Link |
|---|---|
| Live Space | https://huggingface.co/spaces/AdhamAshraf/SlangGPT |
| Model on HF Hub | https://huggingface.co/AdhamAshraf/SlangGPT |
| Dataset (Egyptian โ MSA) | https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic |
| Feedback Dataset | https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset |
| GitHub Repository | https://github.com/adhamashraf7788/SlangGPT |
| Full Report | https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf |
๐ Acknowledgements
- AraGPT-2 by Antoun et al. (2021)
- Stanford CS224N educational framework
- The Arabic NLP open-source community
- All users who provide feedback to improve SlangGPT
โญ Support the Project
If you find SlangGPT useful:
- โญ Star the GitHub repository
- ๐ค Contribute improvements
- ๐ Submit feedback
- ๐ข Share the project
๐ License
This Space, model, and datasets are released under the MIT License.
Free for academic and commercial use with attribution.
๐ Enjoy Translating!
Thank you for using SlangGPT and helping improve Arabic NLP research.