Text Generation
Transformers
Safetensors
Arabic
gpt2
arabic
egyptian
dialect
slang
translation
gpt-2
aragpt
seq2seq
causal-lm
text-generation-inference
Instructions to use AdhamAshraf/SlangGPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AdhamAshraf/SlangGPT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AdhamAshraf/SlangGPT")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AdhamAshraf/SlangGPT") model = AutoModelForCausalLM.from_pretrained("AdhamAshraf/SlangGPT") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AdhamAshraf/SlangGPT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AdhamAshraf/SlangGPT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdhamAshraf/SlangGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AdhamAshraf/SlangGPT
- SGLang
How to use AdhamAshraf/SlangGPT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AdhamAshraf/SlangGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdhamAshraf/SlangGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AdhamAshraf/SlangGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdhamAshraf/SlangGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AdhamAshraf/SlangGPT with Docker Model Runner:
docker model run hf.co/AdhamAshraf/SlangGPT
| language: | |
| - ar | |
| license: mit | |
| base_model: aubmindlab/aragpt2-medium | |
| tags: | |
| - arabic | |
| - egyptian | |
| - dialect | |
| - slang | |
| - translation | |
| - gpt-2 | |
| - aragpt | |
| - seq2seq | |
| - causal-lm | |
| datasets: | |
| - AdhamAshraf/egyptian-2-arabic | |
| - AdhamAshraf/slanggpt-feedback-dataset | |
| metrics: | |
| - chrF | |
| - BLEU | |
| - perplexity | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # SlangGPT: Egyptian Arabic โ Modern Standard Arabic (MSA) | |
| **SlangGPT** is a fine-tuned **AraGPT-2-medium** model that translates **Egyptian Arabic slang/dialect** into **Modern Standard Arabic (MSA)**. | |
| It is part of the broader SlangGPT project โ an end-to-end Arabic NLP system for dialect translation and translation verification. | |
| --- | |
| # ๐ Project Resources | |
| - **Paper:** | |
| https://github.com/adhamashraf7788/SlangGPT/blob/main/report/SlangGPT_report.pdf | |
| - **Main Dataset:** | |
| https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic | |
| - **Feedback Dataset:** | |
| https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset | |
| - **GitHub Repository:** | |
| https://github.com/adhamashraf7788/SlangGPT | |
| - **Interactive Demo (Hugging Face Space):** | |
| https://huggingface.co/spaces/AdhamAshraf/SlangGPT | |
| --- | |
| # ๐ง Model Description | |
| SlangGPT is a **decoder-only causal language model** built on top of: | |
| - **Base model:** `aubmindlab/aragpt2-medium` | |
| The model was fine-tuned on Egyptian Arabic โ MSA parallel text using conditional autoregressive training. | |
| ## Prompt Format | |
| ```text | |
| dialect: {input} โ msa: | |
| ``` | |
| The model generates the Modern Standard Arabic translation autoregressively. | |
| --- | |
| # โจ Key Features | |
| - **Input:** Egyptian Arabic slang/dialect | |
| - **Output:** Modern Standard Arabic (MSA) | |
| - **Architecture:** GPT-2 style decoder-only transformer | |
| - **Tokenizer:** BPE tokenizer with 64k vocabulary | |
| - **Context length:** 1024 tokens | |
| - **Language:** Arabic | |
| --- | |
| # โ๏ธ Training Configuration | |
| | Parameter | Value | | |
| |---|---| | |
| | Batch size | 8 (effective 32) | | |
| | Learning rate | 5e-5 | | |
| | Scheduler | Cosine | | |
| | Warmup | 10% | | |
| | Gradient clipping | 1.0 | | |
| --- | |
| # ๐๏ธ Inference Configuration | |
| | Parameter | Value | | |
| |---|---| | |
| | Temperature | 0.7 | | |
| | Top-k | 50 | | |
| | Top-p | 0.92 | | |
| | Repetition penalty | 1.3 | | |
| --- | |
| # ๐ Quantitative Performance | |
| | Metric | Base AraGPT-2 | SlangGPT | | |
| |---|---|---| | |
| | chrF | 10.62 | **29.08** | | |
| | BLEU | 0.02 | **6.63** | | |
| | chrF Improvement | โ | **+18.46 (+173%)** | | |
| ### Metric Notes | |
| - **chrF** measures character n-gram overlap. | |
| - **BLEU** measures word n-gram precision. | |
| --- | |
| # ๐ Usage | |
| ## 1. Install Dependencies | |
| ```bash | |
| pip install transformers torch | |
| ``` | |
| --- | |
| ## 2. Load Model and Tokenizer | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_name = "AdhamAshraf/SlangGPT" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| if tokenizer.pad_token is None: | |
| tokenizer.pad_token = tokenizer.eos_token | |
| tokenizer.padding_side = "left" | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.float16, | |
| device_map="auto" | |
| ) | |
| model.eval() | |
| ``` | |
| --- | |
| ## 3. Translation Function | |
| ```python | |
| def translate(egyptian_text): | |
| prompt = f"dialect: {egyptian_text.strip()} โ msa:" | |
| inputs = tokenizer( | |
| prompt, | |
| return_tensors="pt", | |
| truncation=True, | |
| max_length=64 | |
| ) | |
| inputs = { | |
| k: v.to(model.device) | |
| for k, v in inputs.items() | |
| } | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=64, | |
| do_sample=True, | |
| temperature=0.7, | |
| top_k=50, | |
| top_p=0.92, | |
| repetition_penalty=1.3, | |
| pad_token_id=tokenizer.pad_token_id, | |
| eos_token_id=tokenizer.eos_token_id, | |
| ) | |
| full = tokenizer.decode( | |
| outputs[0], | |
| skip_special_tokens=True | |
| ) | |
| if "msa:" in full: | |
| return full.split("msa:")[-1].strip() | |
| return full | |
| ``` | |
| --- | |
| ## 4. Example Usage | |
| ```python | |
| print(translate("ููุง ูููุ")) | |
| # ููุงุ ุฃูู ุฃูุชุ | |
| print(translate("ุฅูุช ุฑุงูุญ ูููุ")) | |
| # ุฃูู ุฃูุช ุฐุงูุจุ | |
| print(translate("ุนุงูุฒ ุงูู")) | |
| # ุฃุฑูุฏ ุงูุทุนุงู | |
| ``` | |
| --- | |
| # ๐ Interactive Web App | |
| Try the live demo here: | |
| https://huggingface.co/spaces/AdhamAshraf/SlangGPT | |
| The Space allows users to: | |
| - Translate Egyptian Arabic to MSA | |
| - Submit feedback | |
| - Rate translation quality | |
| - Help improve future versions of SlangGPT | |
| --- | |
| # ๐ Training Dataset | |
| SlangGPT was fine-tuned using: | |
| ## AdhamAshraf/egyptian-2-arabic | |
| Dataset statistics: | |
| | Property | Value | | |
| |---|---| | |
| | Total samples | 18,250 | | |
| | Format | Parallel Egyptian โ MSA | | |
| | Train split | 80% | | |
| | Validation split | 10% | | |
| | Test split | 10% | | |
| ### Preprocessing Steps | |
| - Diacritic removal | |
| - Punctuation normalization | |
| - English text filtering | |
| The dataset was derived from the original Egyptian-English corpus by Abdalrahmankamel, with English translations replaced by curated MSA equivalents. | |
| --- | |
| # ๐งช Evaluation & Feedback | |
| The model was evaluated using: | |
| - chrF | |
| - BLEU | |
| User feedback collected through the Gradio Space is publicly stored in: | |
| https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset | |
| This feedback dataset supports: | |
| - RLHF research | |
| - Translation verification | |
| - Reward model training | |
| - Error analysis | |
| --- | |
| # ๐ License | |
| This project is released under the MIT License. | |
| Free for academic and commercial use with attribution. | |
| --- | |
| # ๐ Acknowledgements | |
| - AraGPT-2 by Antoun et al. (2021) | |
| - Stanford CS224N framework and educational materials | |
| - The Arabic NLP open-source community | |
| --- | |
| # ๐ Citation | |
| ```bibtex | |
| @software{slanggpt2026, | |
| author = {Abdelrahman Ahmed and Adham Ashraf and Ahmed Fekry}, | |
| title = {SlangGPT: Fine-tuning AraGPT-2 for Egyptian Arabic Dialect-to-MSA Translation}, | |
| year = {2026}, | |
| url = {https://github.com/adhamashraf7788/SlangGPT} | |
| } | |
| @dataset{egyptian_2_arabic, | |
| author = {Adham Ashraf and Abdelrahman Ahmed and Ahmed Fekry}, | |
| title = {Egyptian Arabic Slang to Formal Arabic Dataset}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| url = {https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic} | |
| } | |
| ``` | |
| --- | |
| # โ Questions & Issues | |
| For bugs, issues, or feature requests: | |
| https://github.com/adhamashraf7788/SlangGPT/issues |