theBOrg32
/

syrian_qwen_3.5_4B

Safetensors

qwen3_5

Model card Files Files and versions

xet

Community

theborg321 commited on Mar 6

Commit

3dc8186

verified ·

1 Parent(s): f56387f

Update README.md

Browse files

Files changed (1) hide show

README.md +128 -3

README.md CHANGED Viewed

@@ -1,3 +1,128 @@
----
-license: cc-by-sa-4.0
----

+---
+license: cc-by-sa-4.0
+---
+# 🇸🇾 Syrian_Qwen-3.5: The First Syrian Dialect LLM
+<p align="center">
+  <img src="https://img.shields.io/badge/License-CC--BY--SA--4.0-green.svg" alt="License">
+  <img src="https://img.shields.io/badge/Language-Arabic_(Syrian_Dialect)-red.svg" alt="Language">
+  <img src="https://img.shields.io/badge/Base_Model-Qwen_3.5-blue.svg" alt="Base Model">
+  <img src="https://img.shields.io/badge/Task-Text_Generation-orange.svg" alt="Task">
+</p>
+---
+## 🌟 Introduction
+**Welcome to the future of Levantine AI.**
+We are thrilled to introduce **Syrian_Qwen-3.5**, the **first series of Large Language Models specifically fine-tuned to understand and generate the Syrian Arabic Dialect.**
+While most Arabic LLMs focus on Modern Standard Arabic (MSA/Fusha), they often fail to capture the nuance, warmth, and cultural specificity of local dialects. We changed that. By leveraging the powerful **Qwen 3.5** architecture, we have fine-tuned this model not just to "speak Arabic," but to speak **Syrian**.
+From the streets of Syria, this model understands the local idioms, slang, and cultural context that define Syrian communication.
+### 🚀 Key Features
+*   🗣️ **Native Dialect:** Trained specifically on Syrian colloquial data, not just MSA.
+*   🧠 **Smart & Small:** Built on efficient Qwen small-model architecture for fast inference.
+*   🤝 **Community First:** Open weights with a strong commitment to the open-source ecosystem.
+---
+## 🛠️ How to Load & Use
+Getting started with **Syrian_Qwen-3.5** is seamless. You can load it using the standard `transformers` library.
+### Requirements
+```bash
+pip install transformers torch accelerate
+```
+### Inference Code
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "theBOrg32/syrian_qwen_3.5_4B"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    trust_remote_code=True,
+    torch_dtype="auto"
+)
+# Prepare your Syrian Dialect prompt
+prompt = "شو رأيك بالمطعم الجديد بدمشق؟"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+# Generate
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=256,
+    temperature=0.5,
+    do_sample=True
+)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+---
+## 💬 Examples & Benchmarks
+We tested **Syrian_Qwen-3.5** against general-purpose models. Notice how our model captures the *tone* and *dialect* while others revert to formal Arabic or English.
+| Prompt (Syrian Dialect) | Syrian_Qwen-3.5 (Ours) | OpenAI | Gemini |
+| :--- | :--- | :--- | :--- |
+| **"شو بتعرف عن المنطق ؟"** | . المنطق بيستخدم استدلالات مختلفة متل الاستدلال الاستنتاجي والاستدلال التحليلي. الاستدلال الاستنتاجي بيستخدم القواعد والمبادئ العامة ليوصل لنتائج محددة، بينما الاستدلال التحليلي بيقوم بتحليل الأفكار والمعلومات ليوصل لفهم أعمق. المنطق كمان بيستخدم استدلالات شكلية وغير شكلية. الاستدلالات الشكلية بتستخدم قواعد ثابتة ومحددة، بينما الاستدلالات غير الشكلية بتتغير حسب السياق والمعلومات المتاحة. المنطق كمان بيستخدم استدلالات منطقية بيزيد من فهمنا للعالم اللي حولنا، | المنطق هو فرع من **Logic** يدرس قواعد التفكير الصحيح والاستدلال،<br>وكيف نميّز الحجج الصحيحة من الخاطئة.<br>يستخدم في الفلسفة والرياضيات وعلوم الحاسوب لتحليل البراهين وبناء استنتاجات دقيقة. | المنطق هو العلم الذي يدرس القواعد العامة للتفكير الصحيح، ويهدف إلى حماية العقل من الوقوع في الخطأ من خلال وضع معايير تميز بين الحجج السليمة والباطلة.|
+> Note: the full response is too long for all models so we trimmed, the idea here is just to show that even the SOTA models cannot handle the dialect.
+---
+## ⚖️ License & Commercial Use
+We are strong believers in the **Open Source Community**. To ensure this technology remains accessible and beneficial to everyone, we have chosen a **Copyleft License**.
+### 📄 License: CC-BY-SA-4.0
+This model is released under the **Creative Commons Attribution-ShareAlike 4.0 International License**.
+### 🤝 Usage Guidelines
+1.  **✅ Open Source Projects:** You are free to use, fine-tune, and distribute this model in your projects, **provided** your project also remains open-source and references **Syrian_Qwen-3.5**.
+2.  **✅ Commercial Use:** Commercial usage is **allowed** under the terms of CC-BY-SA-4.0 (your derivative models must remain open).
+3.  **🔒 Closed Source / Proprietary:** If you wish to integrate this model (or a fine-tuned version) into a **closed-source product** without releasing your weights/code, **you must obtain prior approval.**
+> **📧 For Closed-Source Licensing:**
+> Please contact us at **[info2@the-borg.ru](mailto:info2@the-borg.ru)** to discuss agreements that respect our open-source mission.
+---
+## 🙏 Credits & Acknowledgments
+This model would not be possible without the foundational work of the **Qwen Team** at Alibaba Cloud. We stand on the shoulders of giants.
+*   **Base Model:** [Qwen 3.5](https://huggingface.co/Qwen)
+*   **Fine-Tuning & Alignment:** The Borg Organization
+*   **Dataset:** Curated Syrian Dialect Corpus
+### Citation
+If you use **Syrian_Qwen-3.5** in your research or project, please cite us:
+```bibtex
+@misc{syrian_qwen_2026,
+  title={Syrian_Qwen-3.5: The First Syrian Dialect Large Language Model},
+  author={The Borg Organization},
+  year={2026},
+  license={CC-BY-SA-4.0}
+}
+```
+---
+<p align="center">
+  <b>Built with ❤️ for the Syrian Community & The World</b><br>
+  <i>Preserving language, one token at a time.</i>
+</p>