ُEgyptian_Qwen-3.5: The First Egyptian Dialect LLM
🌟 Introduction
Welcome to the future of Levantine AI.
We are thrilled to introduce Egyptian_Qwen-3.5, the first series of Large Language Models specifically fine-tuned to understand and generate the Egyptian Arabic Dialect.
While most Arabic LLMs focus on Modern Standard Arabic (MSA/Fusha), they often fail to capture the nuance, warmth, and cultural specificity of local dialects. We changed that. By leveraging the powerful Qwen 3.5 architecture, we have fine-tuned this model not just to "speak Arabic," but to speak Egyptian.
From the streets of Egypt, this model understands the local idioms, slang, and cultural context that define Egyptian communication.
🚀 Key Features
- 🗣️ Native Dialect: Trained specifically on Egyptian colloquial data, not just MSA.
- 🧠 Smart & Small: Built on efficient Qwen small-model architecture for fast inference.
- 🤝 Community First: Open weights with a strong commitment to the open-source ecosystem.
🛠️ How to Load & Use
Getting started with Egyptian_Qwen-3.5 is seamless. You can load it using the standard transformers library.
Requirements
pip install transformers torch accelerate
Inference Code
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "theBOrg32/Egyptian_qwen_3.5_9B"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
trust_remote_code=True,
torch_dtype="auto"
)
# Prepare your Egyptian Dialect prompt
prompt = "إزيك يا فندم؟"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.5,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
💬 Examples & Benchmarks
We tested Egyptian_Qwen-3.5 against general-purpose models. Notice how our model captures the tone and dialect while others revert to formal Arabic or English.
| Prompt (Egyptian Dialect) | Egyptian_Qwen-3.5 (Ours) | OpenAI | Gemini |
|---|---|---|---|
| تعرف إيه عن المنطق ؟ | المنطق هو جزء من الفلسفة والعلوم، وبيدرس القوانين والعمليات اللي بتتحكم في التفكير والاستدلال. وبيشمل دراسة الاستنتاجات والبراهين، وكمان تحليل الأفكار والمفاهيم، وبيحاول يحدد إيه اللي صواب وإيه اللي غلط. المنطق بيقسم لجزئين: المنطق الصوري والمنطق الفعلي. المنطق الصوري بيهتم بالقوانين والعلاقات اللي ممكن تتستخدم في التفكير والاستدلال، وده بيكون غالبًا على شكل قواعد رياضية أو دلالية. ومن الناحية التانية، المنطق الفعلي بيهتم بتطبيق القوانين دي على الواقع، وبيحاول يفهم إزاي بيبقى صعب أو سهل إنك تطبق المنطق في مواقف معينة. ومن أهم أنواع المنطق: المنطق الكلاسيكي، والمنطق النمطي، والمنطق التوضيحي، والمنطق الجبري | الـ منطق هو فرع من فروع الفلسفة والرياضيات يهتم بدراسة قواعد التفكير الصحيح والاستدلال. يعني ببساطة: كيف نميّز بين الحجة الصحيحة والحجة الخاطئة، وكيف نبني استنتاجات سليمة من معلومات موجودة. 🧠 ما الذي يدرسه المنطق؟ المنطق يهتم بأشياء مثل: الاستدلال: كيف نصل من مقدمات إلى نتيجة | المنطق هو "آلة عقلية" تعصم الذهن من الوقوع في الخطأ، أو ببساطة هو علم دراسة قواعد التفكير السليم وكيفية استخلاص النتائج من المقدمات. Wikipedia Wikipedia +1 إليك أهم ما يميزه باختصار: التعريف الفلسفي: اعتبره ابن سينا كـ "الميزان" الذي يميز الحق من الباطل، ونسبته للمعاني مثل نسبة النحو للكلام. |
Note: the full response is too long for all models so we trimmed, the idea here is just to show that even the SOTA models cannot handle the dialect.
⚖️ License & Commercial Use
We are strong believers in the Open Source Community. To ensure this technology remains accessible and beneficial to everyone, we have chosen a Copyleft License.
📄 License: CC-BY-SA-4.0
This model is released under the Creative Commons Attribution-ShareAlike 4.0 International License.
🤝 Usage Guidelines
- ✅ Open Source Projects: You are free to use, fine-tune, and distribute this model in your projects, provided your project also remains open-source and references Egyptian_Qwen-3.5.
- ✅ Commercial Use: Commercial usage is allowed under the terms of CC-BY-SA-4.0 (your derivative models must remain open).
- 🔒 Closed Source / Proprietary: If you wish to integrate this model (or a fine-tuned version) into a closed-source product without releasing your weights/code, you must obtain prior approval.
📧 For Closed-Source Licensing: Please contact us at info2@the-borg.ru to discuss agreements that respect our open-source mission.
🙏 Credits & Acknowledgments
This model would not be possible without the foundational work of the Qwen Team at Alibaba Cloud. We stand on the shoulders of giants.
- Base Model: Qwen 3.5
- Fine-Tuning & Alignment: The Borg Organization
- Dataset: Curated Syrian Dialect Corpus
Citation
If you use Egyptian_Qwen-3.5 in your research or project, please cite us:
@misc{Egyptian_qwen_2026,
title={Egyptian_Qwen-3.5: The First Egyptian Dialect Large Language Model},
author={The Borg Organization},
year={2026},
license={CC-BY-SA-4.0}
}
Built with ❤️ for the Egyptian Community & The World
Preserving language, one token at a time.
- Downloads last month
- -