SlangGPT / README.md
AdhamAshraf's picture
Update README.md
5e823b7 verified

A newer version of the Gradio SDK is available: 6.15.0

Upgrade
metadata
title: SlangGPT
emoji: ๐ŸŒ
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 6.14.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
short_description: Egyptian Arabic slang โ†’ Modern Standard Arabic translation

SlangGPT โ€“ Egyptian Arabic โ†’ Modern Standard Arabic

โšก Real-time Egyptian Arabic slang translation powered by AraGPT-2.

GitHub Repository ๐Ÿค— Model ๐Ÿค— Dataset ๐Ÿค— Spaces License: MIT


๐Ÿง  About the Project

SlangGPT is a fine-tuned AraGPT-2 model designed to translate Egyptian Arabic slang/dialect into Modern Standard Arabic (MSA).

The project also includes:

  • โœ… A translation verification (detection) model
  • โญ Human feedback collection
  • ๐Ÿ“Š Public research datasets
  • ๐Ÿค– RLHF-ready feedback pipeline

๐Ÿ‘‰ Type an Egyptian Arabic sentence below and get the MSA translation instantly!


โœจ Features

  • ๐Ÿ‡ช๐Ÿ‡ฌ Egyptian Arabic slang understanding
  • ๐Ÿ“˜ Translation into Modern Standard Arabic (MSA)
  • ๐Ÿค– Fine-tuned AraGPT-2 language model
  • ๐Ÿง  Translation verification / detection model
  • โญ Human feedback collection pipeline
  • ๐Ÿ“Š Public feedback dataset for research
  • ๐ŸŒ Interactive Gradio interface

๐Ÿ’ก Example Inputs

Try these examples:

  • ุนุงู…ู„ ุงูŠู‡ุŸ
  • ุฅูŠู‡ ุงู„ุฃุฎุจุงุฑุŸ
  • ู‡ูˆ ุงู†ุช ุฑุงูŠุญ ููŠู†ุŸ
  • ุนุงูŠุฒ ุฃุฑูˆุญ ุงู„ุจูŠุช
  • ุฃู†ุง ุฒู‡ู‚ุงู† ุฌุฏู‹ุง
  • ุงู„ุฏู†ูŠุง ุญุฑ ุงู„ู†ู‡ุงุฑุฏุฉ

๐Ÿ“„ Full Project Report

For all technical details โ€” architecture, training, hyperparameters, evaluation, error analysis, and comparison with Stanford CS224N baselines โ€” read the full report:

๐Ÿ“„ https://github.com/adhamashraf7788/SlangGPT/blob/main/SlangGPT_report.pdf


๐Ÿง  How It Works

The model expects the following prompt format:

dialect: {your sentence} โ†” msa:

The model then autoregressively generates the corresponding MSA translation.

Decoding Strategy

  • Temperature = 0.7
  • Top-k = 50
  • Top-p = 0.92
  • Repetition penalty = 1.3

These settings improve fluency while reducing repetitive outputs.


๐Ÿ“ Feedback System

After each translation, users can provide feedback:

  1. โœ… Is the translation correct? (Yes / No)
  2. โœ๏ธ Provide a corrected MSA translation (optional)
  3. โญ Rate translation quality (1โ€“5)

All feedback is stored in the public dataset:

๐Ÿ”— https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset

Collected feedback will help improve future SlangGPT versions and support Arabic RLHF research.


๐Ÿ“Š Model Performance

Generation Quality

Metric Zero-shot (Base AraGPT-2) SlangGPT
chrF 10.62 29.08
BLEU 0.02 6.63

Detection Model

Task Accuracy
Translation Verification 0.956

Improvements

  • ๐Ÿ“ˆ chrF improvement: +18.46
  • ๐Ÿ“ˆ Detection accuracy improvement: +45.6 points

๐Ÿ”ฌ Research Contributions

This project contributes:

  • A fine-tuned Egyptian Arabic โ†’ MSA generation model
  • A translation verification classifier
  • A public human-feedback dataset
  • An RLHF-ready Arabic NLP pipeline

The project aims to support future Arabic dialect NLP research and low-resource language modeling.


โš ๏ธ Limitations

The model may struggle with:

  • Rare slang expressions
  • Mixed Arabic-English text
  • Heavy sarcasm or idioms
  • Long conversational context

Translations are generated probabilistically and may occasionally contain inaccuracies.


๐Ÿ” Feedback & Privacy

Submitted feedback may be stored publicly in the research feedback dataset.

Please avoid submitting:

  • Personal information
  • Phone numbers
  • Addresses
  • Sensitive/private content

๐Ÿš€ Future Work

Planned future improvements include:

  • Larger instruction-tuned Arabic models
  • RLHF fine-tuning using collected feedback
  • Better dialect generalization
  • Arabic-English code-switching support
  • Faster inference optimization

๐Ÿ—๏ธ Technical Stack

  • Transformers ๐Ÿค—
  • PyTorch
  • Gradio
  • Hugging Face Spaces
  • AraGPT-2
  • pandas
  • scikit-learn

๐Ÿ“š Resources


๐Ÿ™ Acknowledgements

  • AraGPT-2 by Antoun et al. (2021)
  • Stanford CS224N educational framework
  • The Arabic NLP open-source community
  • All users who provide feedback to improve SlangGPT

โญ Support the Project

If you find SlangGPT useful:

  • โญ Star the GitHub repository
  • ๐Ÿค Contribute improvements
  • ๐Ÿ“ Submit feedback
  • ๐Ÿ“ข Share the project

๐Ÿ“œ License

This Space, model, and datasets are released under the MIT License.

Free for academic and commercial use with attribution.


๐Ÿš€ Enjoy Translating!

Thank you for using SlangGPT and helping improve Arabic NLP research.