theborg321 commited on
Commit
3dc8186
·
verified ·
1 Parent(s): f56387f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +128 -3
README.md CHANGED
@@ -1,3 +1,128 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ ---
4
+ # 🇸🇾 Syrian_Qwen-3.5: The First Syrian Dialect LLM
5
+
6
+ <p align="center">
7
+ <img src="https://img.shields.io/badge/License-CC--BY--SA--4.0-green.svg" alt="License">
8
+ <img src="https://img.shields.io/badge/Language-Arabic_(Syrian_Dialect)-red.svg" alt="Language">
9
+ <img src="https://img.shields.io/badge/Base_Model-Qwen_3.5-blue.svg" alt="Base Model">
10
+ <img src="https://img.shields.io/badge/Task-Text_Generation-orange.svg" alt="Task">
11
+ </p>
12
+
13
+ ---
14
+
15
+ ## 🌟 Introduction
16
+
17
+ **Welcome to the future of Levantine AI.**
18
+
19
+ We are thrilled to introduce **Syrian_Qwen-3.5**, the **first series of Large Language Models specifically fine-tuned to understand and generate the Syrian Arabic Dialect.**
20
+
21
+ While most Arabic LLMs focus on Modern Standard Arabic (MSA/Fusha), they often fail to capture the nuance, warmth, and cultural specificity of local dialects. We changed that. By leveraging the powerful **Qwen 3.5** architecture, we have fine-tuned this model not just to "speak Arabic," but to speak **Syrian**.
22
+
23
+ From the streets of Syria, this model understands the local idioms, slang, and cultural context that define Syrian communication.
24
+
25
+ ### 🚀 Key Features
26
+ * 🗣️ **Native Dialect:** Trained specifically on Syrian colloquial data, not just MSA.
27
+ * 🧠 **Smart & Small:** Built on efficient Qwen small-model architecture for fast inference.
28
+ * 🤝 **Community First:** Open weights with a strong commitment to the open-source ecosystem.
29
+
30
+ ---
31
+
32
+ ## 🛠️ How to Load & Use
33
+
34
+ Getting started with **Syrian_Qwen-3.5** is seamless. You can load it using the standard `transformers` library.
35
+
36
+ ### Requirements
37
+ ```bash
38
+ pip install transformers torch accelerate
39
+ ```
40
+
41
+ ### Inference Code
42
+ ```python
43
+ from transformers import AutoModelForCausalLM, AutoTokenizer
44
+
45
+ model_name = "theBOrg32/syrian_qwen_3.5_4B"
46
+
47
+ # Load tokenizer and model
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
49
+ model = AutoModelForCausalLM.from_pretrained(
50
+ model_name,
51
+ device_map="auto",
52
+ trust_remote_code=True,
53
+ torch_dtype="auto"
54
+ )
55
+
56
+ # Prepare your Syrian Dialect prompt
57
+ prompt = "شو رأيك بالمطعم الجديد بدمشق؟"
58
+
59
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
60
+
61
+ # Generate
62
+ outputs = model.generate(
63
+ **inputs,
64
+ max_new_tokens=256,
65
+ temperature=0.5,
66
+ do_sample=True
67
+ )
68
+
69
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
70
+ print(response)
71
+ ```
72
+
73
+ ---
74
+
75
+ ## 💬 Examples & Benchmarks
76
+
77
+ We tested **Syrian_Qwen-3.5** against general-purpose models. Notice how our model captures the *tone* and *dialect* while others revert to formal Arabic or English.
78
+
79
+ | Prompt (Syrian Dialect) | Syrian_Qwen-3.5 (Ours) | OpenAI | Gemini |
80
+ | :--- | :--- | :--- | :--- |
81
+ | **"شو بتعرف عن المنطق ؟"** | . المنطق بيستخدم استدلالات مختلفة متل الاستدلال الاستنتاجي والاستدلال التحليلي. الاستدلال الاستنتاجي بيستخدم القواعد والمبادئ العامة ليوصل لنتائج محددة، بينما الاستدلال التحليلي بيقوم بتحليل الأفكار والمعلومات ليوصل لفهم أعمق. المنطق كمان بيستخدم استدلالات شكلية وغير شكلية. الاستدلالات الشكلية بتستخدم قواعد ثابتة ومحددة، بينما الاستدلالات غير الشكلية بتتغير حسب السياق والمعلومات المتاحة. المنطق كمان بيستخدم استدلالات منطقية بيزيد من فهمنا للعالم اللي حولنا، | المنطق هو فرع من **Logic** يدرس قواعد التفكير الصحيح والاستدلال،<br>وكيف نميّز الحجج الصحيحة من الخاطئة.<br>يستخدم في الفلسفة والرياضيات وعلوم الحاسوب لتحليل البراهين وبناء استنتاجات دقيقة. | المنطق هو العلم الذي يدرس القواعد العامة للتفكير الصحيح، ويهدف إلى حماية العقل من الوقوع في الخطأ من خلال وضع معايير تميز بين الحجج السليمة والباطلة.|
82
+
83
+ > Note: the full response is too long for all models so we trimmed, the idea here is just to show that even the SOTA models cannot handle the dialect.
84
+ ---
85
+
86
+ ## ⚖️ License & Commercial Use
87
+
88
+ We are strong believers in the **Open Source Community**. To ensure this technology remains accessible and beneficial to everyone, we have chosen a **Copyleft License**.
89
+
90
+ ### 📄 License: CC-BY-SA-4.0
91
+ This model is released under the **Creative Commons Attribution-ShareAlike 4.0 International License**.
92
+
93
+ ### 🤝 Usage Guidelines
94
+ 1. **✅ Open Source Projects:** You are free to use, fine-tune, and distribute this model in your projects, **provided** your project also remains open-source and references **Syrian_Qwen-3.5**.
95
+ 2. **✅ Commercial Use:** Commercial usage is **allowed** under the terms of CC-BY-SA-4.0 (your derivative models must remain open).
96
+ 3. **🔒 Closed Source / Proprietary:** If you wish to integrate this model (or a fine-tuned version) into a **closed-source product** without releasing your weights/code, **you must obtain prior approval.**
97
+
98
+ > **📧 For Closed-Source Licensing:**
99
+ > Please contact us at **[info2@the-borg.ru](mailto:info2@the-borg.ru)** to discuss agreements that respect our open-source mission.
100
+
101
+ ---
102
+
103
+ ## 🙏 Credits & Acknowledgments
104
+
105
+ This model would not be possible without the foundational work of the **Qwen Team** at Alibaba Cloud. We stand on the shoulders of giants.
106
+
107
+ * **Base Model:** [Qwen 3.5](https://huggingface.co/Qwen)
108
+ * **Fine-Tuning & Alignment:** The Borg Organization
109
+ * **Dataset:** Curated Syrian Dialect Corpus
110
+
111
+ ### Citation
112
+ If you use **Syrian_Qwen-3.5** in your research or project, please cite us:
113
+
114
+ ```bibtex
115
+ @misc{syrian_qwen_2026,
116
+ title={Syrian_Qwen-3.5: The First Syrian Dialect Large Language Model},
117
+ author={The Borg Organization},
118
+ year={2026},
119
+ license={CC-BY-SA-4.0}
120
+ }
121
+ ```
122
+
123
+ ---
124
+
125
+ <p align="center">
126
+ <b>Built with ❤️ for the Syrian Community & The World</b><br>
127
+ <i>Preserving language, one token at a time.</i>
128
+ </p>