ُEgyptian_Qwen-3.5: The First Egyptian Dialect LLM

License Language Base Model Task


🌟 Introduction

Welcome to the future of Levantine AI.

We are thrilled to introduce Egyptian_Qwen-3.5, the first series of Large Language Models specifically fine-tuned to understand and generate the Egyptian Arabic Dialect.

While most Arabic LLMs focus on Modern Standard Arabic (MSA/Fusha), they often fail to capture the nuance, warmth, and cultural specificity of local dialects. We changed that. By leveraging the powerful Qwen 3.5 architecture, we have fine-tuned this model not just to "speak Arabic," but to speak Egyptian.

From the streets of Egypt, this model understands the local idioms, slang, and cultural context that define Egyptian communication.

🚀 Key Features

  • 🗣️ Native Dialect: Trained specifically on Egyptian colloquial data, not just MSA.
  • 🧠 Smart & Small: Built on efficient Qwen small-model architecture for fast inference.
  • 🤝 Community First: Open weights with a strong commitment to the open-source ecosystem.

🛠️ How to Load & Use

Getting started with Egyptian_Qwen-3.5 is seamless. You can load it using the standard transformers library.

Requirements

pip install transformers torch accelerate

Inference Code

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "theBOrg32/Egyptian_qwen_3.5_9B"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    device_map="auto", 
    trust_remote_code=True,
    torch_dtype="auto"
)

# Prepare your Egyptian Dialect prompt
prompt = "إزيك يا فندم؟" 

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    **inputs, 
    max_new_tokens=256, 
    temperature=0.5,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

💬 Examples & Benchmarks

We tested Egyptian_Qwen-3.5 against general-purpose models. Notice how our model captures the tone and dialect while others revert to formal Arabic or English.

Prompt (Egyptian Dialect) Egyptian_Qwen-3.5 (Ours) OpenAI Gemini
تعرف إيه عن المنطق ؟ المنطق هو جزء من الفلسفة والعلوم، وبيدرس القوانين والعمليات اللي بتتحكم في التفكير والاستدلال. وبيشمل دراسة الاستنتاجات والبراهين، وكمان تحليل الأفكار والمفاهيم، وبيحاول يحدد إيه اللي صواب وإيه اللي غلط. المنطق بيقسم لجزئين: المنطق الصوري والمنطق الفعلي. المنطق الصوري بيهتم بالقوانين والعلاقات اللي ممكن تتستخدم في التفكير والاستدلال، وده بيكون غالبًا على شكل قواعد رياضية أو دلالية. ومن الناحية التانية، المنطق الفعلي بيهتم بتطبيق القوانين دي على الواقع، وبيحاول يفهم إزاي بيبقى صعب أو سهل إنك تطبق المنطق في مواقف معينة. ومن أهم أنواع المنطق: المنطق الكلاسيكي، والمنطق النمطي، والمنطق التوضيحي، والمنطق الجبري الـ منطق هو فرع من فروع الفلسفة والرياضيات يهتم بدراسة قواعد التفكير الصحيح والاستدلال. يعني ببساطة: كيف نميّز بين الحجة الصحيحة والحجة الخاطئة، وكيف نبني استنتاجات سليمة من معلومات موجودة. 🧠 ما الذي يدرسه المنطق؟ المنطق يهتم بأشياء مثل: الاستدلال: كيف نصل من مقدمات إلى نتيجة المنطق هو "آلة عقلية" تعصم الذهن من الوقوع في الخطأ، أو ببساطة هو علم دراسة قواعد التفكير السليم وكيفية استخلاص النتائج من المقدمات. Wikipedia Wikipedia +1 إليك أهم ما يميزه باختصار: التعريف الفلسفي: اعتبره ابن سينا كـ "الميزان" الذي يميز الحق من الباطل، ونسبته للمعاني مثل نسبة النحو للكلام.

Note: the full response is too long for all models so we trimmed, the idea here is just to show that even the SOTA models cannot handle the dialect.


⚖️ License & Commercial Use

We are strong believers in the Open Source Community. To ensure this technology remains accessible and beneficial to everyone, we have chosen a Copyleft License.

📄 License: CC-BY-SA-4.0

This model is released under the Creative Commons Attribution-ShareAlike 4.0 International License.

🤝 Usage Guidelines

  1. ✅ Open Source Projects: You are free to use, fine-tune, and distribute this model in your projects, provided your project also remains open-source and references Egyptian_Qwen-3.5.
  2. ✅ Commercial Use: Commercial usage is allowed under the terms of CC-BY-SA-4.0 (your derivative models must remain open).
  3. 🔒 Closed Source / Proprietary: If you wish to integrate this model (or a fine-tuned version) into a closed-source product without releasing your weights/code, you must obtain prior approval.

📧 For Closed-Source Licensing: Please contact us at info2@the-borg.ru to discuss agreements that respect our open-source mission.


🙏 Credits & Acknowledgments

This model would not be possible without the foundational work of the Qwen Team at Alibaba Cloud. We stand on the shoulders of giants.

  • Base Model: Qwen 3.5
  • Fine-Tuning & Alignment: The Borg Organization
  • Dataset: Curated Syrian Dialect Corpus

Citation

If you use Egyptian_Qwen-3.5 in your research or project, please cite us:

@misc{Egyptian_qwen_2026,
  title={Egyptian_Qwen-3.5: The First Egyptian Dialect Large Language Model},
  author={The Borg Organization},
  year={2026},
  license={CC-BY-SA-4.0}
}

Built with ❤️ for the Egyptian Community & The World
Preserving language, one token at a time.

Downloads last month
-
Safetensors
Model size
9B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including theBOrg32/Egyptian_qwen_3.5_9B