Instructions to use AdhamAshraf/SlangGPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AdhamAshraf/SlangGPT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AdhamAshraf/SlangGPT")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AdhamAshraf/SlangGPT") model = AutoModelForCausalLM.from_pretrained("AdhamAshraf/SlangGPT") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AdhamAshraf/SlangGPT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AdhamAshraf/SlangGPT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdhamAshraf/SlangGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AdhamAshraf/SlangGPT
- SGLang
How to use AdhamAshraf/SlangGPT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AdhamAshraf/SlangGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdhamAshraf/SlangGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AdhamAshraf/SlangGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdhamAshraf/SlangGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AdhamAshraf/SlangGPT with Docker Model Runner:
docker model run hf.co/AdhamAshraf/SlangGPT
language:
- ar
license: mit
base_model: aubmindlab/aragpt2-medium
tags:
- arabic
- egyptian
- dialect
- slang
- translation
- gpt-2
- aragpt
- seq2seq
- causal-lm
datasets:
- AdhamAshraf/egyptian-2-arabic
- AdhamAshraf/slanggpt-feedback-dataset
metrics:
- chrF
- BLEU
- perplexity
pipeline_tag: text-generation
library_name: transformers
SlangGPT: Egyptian Arabic โ Modern Standard Arabic (MSA)
SlangGPT is a fine-tuned AraGPT-2-medium model that translates Egyptian Arabic slang/dialect into Modern Standard Arabic (MSA).
It is part of the broader SlangGPT project โ an end-to-end Arabic NLP system for dialect translation and translation verification.
๐ Project Resources
Paper:
https://github.com/adhamashraf7788/SlangGPT/blob/main/report/SlangGPT_report.pdfMain Dataset:
https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabicFeedback Dataset:
https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-datasetGitHub Repository:
https://github.com/adhamashraf7788/SlangGPTInteractive Demo (Hugging Face Space):
https://huggingface.co/spaces/AdhamAshraf/SlangGPT
๐ง Model Description
SlangGPT is a decoder-only causal language model built on top of:
- Base model:
aubmindlab/aragpt2-medium
The model was fine-tuned on Egyptian Arabic โ MSA parallel text using conditional autoregressive training.
Prompt Format
dialect: {input} โ msa:
The model generates the Modern Standard Arabic translation autoregressively.
โจ Key Features
- Input: Egyptian Arabic slang/dialect
- Output: Modern Standard Arabic (MSA)
- Architecture: GPT-2 style decoder-only transformer
- Tokenizer: BPE tokenizer with 64k vocabulary
- Context length: 1024 tokens
- Language: Arabic
โ๏ธ Training Configuration
| Parameter | Value |
|---|---|
| Batch size | 8 (effective 32) |
| Learning rate | 5e-5 |
| Scheduler | Cosine |
| Warmup | 10% |
| Gradient clipping | 1.0 |
๐๏ธ Inference Configuration
| Parameter | Value |
|---|---|
| Temperature | 0.7 |
| Top-k | 50 |
| Top-p | 0.92 |
| Repetition penalty | 1.3 |
๐ Quantitative Performance
| Metric | Base AraGPT-2 | SlangGPT |
|---|---|---|
| chrF | 10.62 | 29.08 |
| BLEU | 0.02 | 6.63 |
| chrF Improvement | โ | +18.46 (+173%) |
Metric Notes
- chrF measures character n-gram overlap.
- BLEU measures word n-gram precision.
๐ Usage
1. Install Dependencies
pip install transformers torch
2. Load Model and Tokenizer
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "AdhamAshraf/SlangGPT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
model.eval()
3. Translation Function
def translate(egyptian_text):
prompt = f"dialect: {egyptian_text.strip()} โ msa:"
inputs = tokenizer(
prompt,
return_tensors="pt",
truncation=True,
max_length=64
)
inputs = {
k: v.to(model.device)
for k, v in inputs.items()
}
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.92,
repetition_penalty=1.3,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
full = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
if "msa:" in full:
return full.split("msa:")[-1].strip()
return full
4. Example Usage
print(translate("ููุง ูููุ"))
# ููุงุ ุฃูู ุฃูุชุ
print(translate("ุฅูุช ุฑุงูุญ ูููุ"))
# ุฃูู ุฃูุช ุฐุงูุจุ
print(translate("ุนุงูุฒ ุงูู"))
# ุฃุฑูุฏ ุงูุทุนุงู
๐ Interactive Web App
Try the live demo here:
https://huggingface.co/spaces/AdhamAshraf/SlangGPT
The Space allows users to:
- Translate Egyptian Arabic to MSA
- Submit feedback
- Rate translation quality
- Help improve future versions of SlangGPT
๐ Training Dataset
SlangGPT was fine-tuned using:
AdhamAshraf/egyptian-2-arabic
Dataset statistics:
| Property | Value |
|---|---|
| Total samples | 18,250 |
| Format | Parallel Egyptian โ MSA |
| Train split | 80% |
| Validation split | 10% |
| Test split | 10% |
Preprocessing Steps
- Diacritic removal
- Punctuation normalization
- English text filtering
The dataset was derived from the original Egyptian-English corpus by Abdalrahmankamel, with English translations replaced by curated MSA equivalents.
๐งช Evaluation & Feedback
The model was evaluated using:
- chrF
- BLEU
User feedback collected through the Gradio Space is publicly stored in:
https://huggingface.co/datasets/AdhamAshraf/slanggpt-feedback-dataset
This feedback dataset supports:
- RLHF research
- Translation verification
- Reward model training
- Error analysis
๐ License
This project is released under the MIT License.
Free for academic and commercial use with attribution.
๐ Acknowledgements
- AraGPT-2 by Antoun et al. (2021)
- Stanford CS224N framework and educational materials
- The Arabic NLP open-source community
๐ Citation
@software{slanggpt2026,
author = {Abdelrahman Ahmed and Adham Ashraf and Ahmed Fekry},
title = {SlangGPT: Fine-tuning AraGPT-2 for Egyptian Arabic Dialect-to-MSA Translation},
year = {2026},
url = {https://github.com/adhamashraf7788/SlangGPT}
}
@dataset{egyptian_2_arabic,
author = {Adham Ashraf and Abdelrahman Ahmed and Ahmed Fekry},
title = {Egyptian Arabic Slang to Formal Arabic Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/AdhamAshraf/egyptian-2-arabic}
}
โ Questions & Issues
For bugs, issues, or feature requests: