Spaces:
Running
Running
| title: SlangGPT | |
| emoji: 🌍 | |
| colorFrom: green | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 6.14.0 | |
| python_version: '3.10' | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Egyptian Arabic slang → Modern Standard Arabic translation | |
| # SlangGPT – Egyptian Arabic → Modern Standard Arabic | |
| > ⚡ Real-time Egyptian Arabic slang translation powered by AraGPT-2. | |
| [](https://github.com/adhamashraf7788/SlangGPT) | |
| [](https://huggingface.co/AdhamAshraf/SlangGPT) | |
| [](https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic) | |
| [](https://huggingface.co/spaces/AdhamAshraf/SlangGPT) | |
| [](https://opensource.org/licenses/MIT) | |
| --- | |
| ## 🧠 About the Project | |
| **SlangGPT** is a fine-tuned **AraGPT-2** model designed to translate **Egyptian Arabic slang/dialect** into **Modern Standard Arabic (MSA)**. | |
| The project also includes: | |
| - ✅ A translation verification (detection) model | |
| - ⭐ Human feedback collection | |
| - 📊 Public research datasets | |
| - 🤖 RLHF-ready feedback pipeline | |
| 👉 **Type an Egyptian Arabic sentence below and get the MSA translation instantly!** | |
| --- | |
| ## ✨ Features | |
| - 🇪🇬 Egyptian Arabic slang understanding | |
| - 📘 Translation into Modern Standard Arabic (MSA) | |
| - 🤖 Fine-tuned AraGPT-2 language model | |
| - 🧠 Translation verification / detection model | |
| - ⭐ Human feedback collection pipeline | |
| - 📊 Public feedback dataset for research | |
| - 🌐 Interactive Gradio interface | |
| --- | |
| ## 💡 Example Inputs | |
| Try these examples: | |
| - `عامل ايه؟` | |
| - `إيه الأخبار؟` | |
| - `هو انت رايح فين؟` | |
| - `عايز أروح البيت` | |
| - `أنا زهقان جدًا` | |
| - `الدنيا حر النهاردة` | |
| --- | |
| ## 📄 Full Project Report | |
| For all technical details — architecture, training, hyperparameters, evaluation, error analysis, and comparison with Stanford CS224N baselines — read the full report: | |
| 📄 https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf | |
| --- | |
| ## 🧠 How It Works | |
| The model expects the following prompt format: | |
| ```text | |
| dialect: {your sentence} ↔ msa: | |
| ``` | |
| The model then autoregressively generates the corresponding MSA translation. | |
| ### Decoding Strategy | |
| - Temperature = 0.7 | |
| - Top-k = 50 | |
| - Top-p = 0.92 | |
| - Repetition penalty = 1.3 | |
| These settings improve fluency while reducing repetitive outputs. | |
| --- | |
| ## 📝 Feedback System | |
| After each translation, users can provide feedback: | |
| 1. ✅ Is the translation correct? (Yes / No) | |
| 2. ✍️ Provide a corrected MSA translation (optional) | |
| 3. ⭐ Rate translation quality (1–5) | |
| All feedback is stored in the public dataset: | |
| 🔗 https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset | |
| Collected feedback will help improve future SlangGPT versions and support Arabic RLHF research. | |
| --- | |
| ## 📊 Model Performance | |
| ### Generation Quality | |
| | Metric | Zero-shot (Base AraGPT-2) | SlangGPT | | |
| |---|---|---| | |
| | chrF | 10.62 | **29.08** | | |
| | BLEU | 0.02 | **6.63** | | |
| ### Detection Model | |
| | Task | Accuracy | | |
| |---|---| | |
| | Translation Verification | **0.956** | | |
| ### Improvements | |
| - 📈 chrF improvement: **+18.46** | |
| - 📈 Detection accuracy improvement: **+45.6 points** | |
| --- | |
| ## 🔬 Research Contributions | |
| This project contributes: | |
| - A fine-tuned Egyptian Arabic → MSA generation model | |
| - A translation verification classifier | |
| - A public human-feedback dataset | |
| - An RLHF-ready Arabic NLP pipeline | |
| The project aims to support future Arabic dialect NLP research and low-resource language modeling. | |
| --- | |
| ## ⚠️ Limitations | |
| The model may struggle with: | |
| - Rare slang expressions | |
| - Mixed Arabic-English text | |
| - Heavy sarcasm or idioms | |
| - Long conversational context | |
| Translations are generated probabilistically and may occasionally contain inaccuracies. | |
| --- | |
| ## 🔐 Feedback & Privacy | |
| Submitted feedback may be stored publicly in the research feedback dataset. | |
| Please avoid submitting: | |
| - Personal information | |
| - Phone numbers | |
| - Addresses | |
| - Sensitive/private content | |
| --- | |
| ## 🚀 Future Work | |
| Planned future improvements include: | |
| - Larger instruction-tuned Arabic models | |
| - RLHF fine-tuning using collected feedback | |
| - Better dialect generalization | |
| - Arabic-English code-switching support | |
| - Faster inference optimization | |
| --- | |
| ## 🏗️ Technical Stack | |
| - Transformers 🤗 | |
| - PyTorch | |
| - Gradio | |
| - Hugging Face Spaces | |
| - AraGPT-2 | |
| - pandas | |
| - scikit-learn | |
| --- | |
| ## 📚 Resources | |
| | Resource | Link | | |
| |---|---| | |
| | Live Space | https://huggingface.co/spaces/AdhamAshraf/SlangGPT | | |
| | Model on HF Hub | https://huggingface.co/AdhamAshraf/SlangGPT | | |
| | Dataset (Egyptian ↔ MSA) | https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic | | |
| | Feedback Dataset | https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset | | |
| | GitHub Repository | https://github.com/adhamashraf7788/SlangGPT | | |
| | Full Report | https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf | | |
| --- | |
| ## 🙏 Acknowledgements | |
| - AraGPT-2 by Antoun et al. (2021) | |
| - Stanford CS224N educational framework | |
| - The Arabic NLP open-source community | |
| - All users who provide feedback to improve SlangGPT | |
| --- | |
| ## ⭐ Support the Project | |
| If you find SlangGPT useful: | |
| - ⭐ Star the GitHub repository | |
| - 🤝 Contribute improvements | |
| - 📝 Submit feedback | |
| - 📢 Share the project | |
| --- | |
| ## 📜 License | |
| This Space, model, and datasets are released under the **MIT License**. | |
| Free for academic and commercial use with attribution. | |
| --- | |
| ## 🚀 Enjoy Translating! | |
| Thank you for using SlangGPT and helping improve Arabic NLP research. |