--- title: SlangGPT emoji: 🌍 colorFrom: green colorTo: yellow sdk: gradio sdk_version: 6.14.0 python_version: '3.10' app_file: app.py pinned: false license: mit short_description: Egyptian Arabic slang → Modern Standard Arabic translation --- # SlangGPT – Egyptian Arabic → Modern Standard Arabic > ⚡ Real-time Egyptian Arabic slang translation powered by AraGPT-2. [![GitHub Repository](https://img.shields.io/badge/GitHub-SlangGPT-181717?logo=github)](https://github.com/adhamashraf7788/SlangGPT) [![🤗 Model](https://img.shields.io/badge/🤗%20Model-SlangGPT-blue)](https://huggingface.co/AdhamAshraf/SlangGPT) [![🤗 Dataset](https://img.shields.io/badge/🤗%20Dataset-Egyptian%20Arabic%20↔%20MSA-orange)](https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic) [![🤗 Spaces](https://img.shields.io/badge/🤗%20Spaces-Live%20Demo-yellow)](https://huggingface.co/spaces/AdhamAshraf/SlangGPT) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) --- ## 🧠 About the Project **SlangGPT** is a fine-tuned **AraGPT-2** model designed to translate **Egyptian Arabic slang/dialect** into **Modern Standard Arabic (MSA)**. The project also includes: - ✅ A translation verification (detection) model - ⭐ Human feedback collection - 📊 Public research datasets - 🤖 RLHF-ready feedback pipeline 👉 **Type an Egyptian Arabic sentence below and get the MSA translation instantly!** --- ## ✨ Features - 🇪🇬 Egyptian Arabic slang understanding - 📘 Translation into Modern Standard Arabic (MSA) - 🤖 Fine-tuned AraGPT-2 language model - 🧠 Translation verification / detection model - ⭐ Human feedback collection pipeline - 📊 Public feedback dataset for research - 🌐 Interactive Gradio interface --- ## 💡 Example Inputs Try these examples: - `عامل ايه؟` - `إيه الأخبار؟` - `هو انت رايح فين؟` - `عايز أروح البيت` - `أنا زهقان جدًا` - `الدنيا حر النهاردة` --- ## 📄 Full Project Report For all technical details — architecture, training, hyperparameters, evaluation, error analysis, and comparison with Stanford CS224N baselines — read the full report: 📄 https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf --- ## 🧠 How It Works The model expects the following prompt format: ```text dialect: {your sentence} ↔ msa: ``` The model then autoregressively generates the corresponding MSA translation. ### Decoding Strategy - Temperature = 0.7 - Top-k = 50 - Top-p = 0.92 - Repetition penalty = 1.3 These settings improve fluency while reducing repetitive outputs. --- ## 📝 Feedback System After each translation, users can provide feedback: 1. ✅ Is the translation correct? (Yes / No) 2. ✍️ Provide a corrected MSA translation (optional) 3. ⭐ Rate translation quality (1–5) All feedback is stored in the public dataset: 🔗 https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset Collected feedback will help improve future SlangGPT versions and support Arabic RLHF research. --- ## 📊 Model Performance ### Generation Quality | Metric | Zero-shot (Base AraGPT-2) | SlangGPT | |---|---|---| | chrF | 10.62 | **29.08** | | BLEU | 0.02 | **6.63** | ### Detection Model | Task | Accuracy | |---|---| | Translation Verification | **0.956** | ### Improvements - 📈 chrF improvement: **+18.46** - 📈 Detection accuracy improvement: **+45.6 points** --- ## 🔬 Research Contributions This project contributes: - A fine-tuned Egyptian Arabic → MSA generation model - A translation verification classifier - A public human-feedback dataset - An RLHF-ready Arabic NLP pipeline The project aims to support future Arabic dialect NLP research and low-resource language modeling. --- ## ⚠️ Limitations The model may struggle with: - Rare slang expressions - Mixed Arabic-English text - Heavy sarcasm or idioms - Long conversational context Translations are generated probabilistically and may occasionally contain inaccuracies. --- ## 🔐 Feedback & Privacy Submitted feedback may be stored publicly in the research feedback dataset. Please avoid submitting: - Personal information - Phone numbers - Addresses - Sensitive/private content --- ## 🚀 Future Work Planned future improvements include: - Larger instruction-tuned Arabic models - RLHF fine-tuning using collected feedback - Better dialect generalization - Arabic-English code-switching support - Faster inference optimization --- ## 🏗️ Technical Stack - Transformers 🤗 - PyTorch - Gradio - Hugging Face Spaces - AraGPT-2 - pandas - scikit-learn --- ## 📚 Resources | Resource | Link | |---|---| | Live Space | https://huggingface.co/spaces/AdhamAshraf/SlangGPT | | Model on HF Hub | https://huggingface.co/AdhamAshraf/SlangGPT | | Dataset (Egyptian ↔ MSA) | https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic | | Feedback Dataset | https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset | | GitHub Repository | https://github.com/adhamashraf7788/SlangGPT | | Full Report | https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf | --- ## 🙏 Acknowledgements - AraGPT-2 by Antoun et al. (2021) - Stanford CS224N educational framework - The Arabic NLP open-source community - All users who provide feedback to improve SlangGPT --- ## ⭐ Support the Project If you find SlangGPT useful: - ⭐ Star the GitHub repository - 🤝 Contribute improvements - 📝 Submit feedback - 📢 Share the project --- ## 📜 License This Space, model, and datasets are released under the **MIT License**. Free for academic and commercial use with attribution. --- ## 🚀 Enjoy Translating! Thank you for using SlangGPT and helping improve Arabic NLP research.