SlangGPT / README.md
AdhamAshraf's picture
Update README.md
5e823b7 verified
---
title: SlangGPT
emoji: 🌍
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 6.14.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
short_description: Egyptian Arabic slang Modern Standard Arabic translation
---
# SlangGPT – Egyptian Arabic → Modern Standard Arabic
> ⚡ Real-time Egyptian Arabic slang translation powered by AraGPT-2.
[![GitHub Repository](https://img.shields.io/badge/GitHub-SlangGPT-181717?logo=github)](https://github.com/adhamashraf7788/SlangGPT)
[![🤗 Model](https://img.shields.io/badge/🤗%20Model-SlangGPT-blue)](https://huggingface.co/AdhamAshraf/SlangGPT)
[![🤗 Dataset](https://img.shields.io/badge/🤗%20Dataset-Egyptian%20Arabic%20↔%20MSA-orange)](https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic)
[![🤗 Spaces](https://img.shields.io/badge/🤗%20Spaces-Live%20Demo-yellow)](https://huggingface.co/spaces/AdhamAshraf/SlangGPT)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
---
## 🧠 About the Project
**SlangGPT** is a fine-tuned **AraGPT-2** model designed to translate **Egyptian Arabic slang/dialect** into **Modern Standard Arabic (MSA)**.
The project also includes:
- ✅ A translation verification (detection) model
- ⭐ Human feedback collection
- 📊 Public research datasets
- 🤖 RLHF-ready feedback pipeline
👉 **Type an Egyptian Arabic sentence below and get the MSA translation instantly!**
---
## ✨ Features
- 🇪🇬 Egyptian Arabic slang understanding
- 📘 Translation into Modern Standard Arabic (MSA)
- 🤖 Fine-tuned AraGPT-2 language model
- 🧠 Translation verification / detection model
- ⭐ Human feedback collection pipeline
- 📊 Public feedback dataset for research
- 🌐 Interactive Gradio interface
---
## 💡 Example Inputs
Try these examples:
- `عامل ايه؟`
- `إيه الأخبار؟`
- `هو انت رايح فين؟`
- `عايز أروح البيت`
- `أنا زهقان جدًا`
- `الدنيا حر النهاردة`
---
## 📄 Full Project Report
For all technical details — architecture, training, hyperparameters, evaluation, error analysis, and comparison with Stanford CS224N baselines — read the full report:
📄 https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf
---
## 🧠 How It Works
The model expects the following prompt format:
```text
dialect: {your sentence} ↔ msa:
```
The model then autoregressively generates the corresponding MSA translation.
### Decoding Strategy
- Temperature = 0.7
- Top-k = 50
- Top-p = 0.92
- Repetition penalty = 1.3
These settings improve fluency while reducing repetitive outputs.
---
## 📝 Feedback System
After each translation, users can provide feedback:
1. ✅ Is the translation correct? (Yes / No)
2. ✍️ Provide a corrected MSA translation (optional)
3. ⭐ Rate translation quality (1–5)
All feedback is stored in the public dataset:
🔗 https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset
Collected feedback will help improve future SlangGPT versions and support Arabic RLHF research.
---
## 📊 Model Performance
### Generation Quality
| Metric | Zero-shot (Base AraGPT-2) | SlangGPT |
|---|---|---|
| chrF | 10.62 | **29.08** |
| BLEU | 0.02 | **6.63** |
### Detection Model
| Task | Accuracy |
|---|---|
| Translation Verification | **0.956** |
### Improvements
- 📈 chrF improvement: **+18.46**
- 📈 Detection accuracy improvement: **+45.6 points**
---
## 🔬 Research Contributions
This project contributes:
- A fine-tuned Egyptian Arabic → MSA generation model
- A translation verification classifier
- A public human-feedback dataset
- An RLHF-ready Arabic NLP pipeline
The project aims to support future Arabic dialect NLP research and low-resource language modeling.
---
## ⚠️ Limitations
The model may struggle with:
- Rare slang expressions
- Mixed Arabic-English text
- Heavy sarcasm or idioms
- Long conversational context
Translations are generated probabilistically and may occasionally contain inaccuracies.
---
## 🔐 Feedback & Privacy
Submitted feedback may be stored publicly in the research feedback dataset.
Please avoid submitting:
- Personal information
- Phone numbers
- Addresses
- Sensitive/private content
---
## 🚀 Future Work
Planned future improvements include:
- Larger instruction-tuned Arabic models
- RLHF fine-tuning using collected feedback
- Better dialect generalization
- Arabic-English code-switching support
- Faster inference optimization
---
## 🏗️ Technical Stack
- Transformers 🤗
- PyTorch
- Gradio
- Hugging Face Spaces
- AraGPT-2
- pandas
- scikit-learn
---
## 📚 Resources
| Resource | Link |
|---|---|
| Live Space | https://huggingface.co/spaces/AdhamAshraf/SlangGPT |
| Model on HF Hub | https://huggingface.co/AdhamAshraf/SlangGPT |
| Dataset (Egyptian ↔ MSA) | https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic |
| Feedback Dataset | https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset |
| GitHub Repository | https://github.com/adhamashraf7788/SlangGPT |
| Full Report | https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf |
---
## 🙏 Acknowledgements
- AraGPT-2 by Antoun et al. (2021)
- Stanford CS224N educational framework
- The Arabic NLP open-source community
- All users who provide feedback to improve SlangGPT
---
## ⭐ Support the Project
If you find SlangGPT useful:
- ⭐ Star the GitHub repository
- 🤝 Contribute improvements
- 📝 Submit feedback
- 📢 Share the project
---
## 📜 License
This Space, model, and datasets are released under the **MIT License**.
Free for academic and commercial use with attribution.
---
## 🚀 Enjoy Translating!
Thank you for using SlangGPT and helping improve Arabic NLP research.