Update README.md

2440523 verified 4 days ago

6.42 kB

	---
	language:
	- ar
	- en
	tags:
	- code
	- arabic
	- gguf
	- code-explanation
	- text-generation
	license: apache-2.0
	---

	# 🐪 AraCode-7B-GGUF

	The first open-source Arabic-specialized code explanation and generation model.

	AraCode-7B understands, explains, and generates code in Arabic — a capability no existing model provides with such precision. Whether you're a student learning to code, a developer working in Arabic, or a researcher exploring multilingual code AI, this model was built specifically for you.

	---

	## 🌟 What makes AraCode-7B different?

	Existing code models (CodeLlama, StarCoder, DeepSeek-Coder) generate excellent code but only communicate effectively in English. On the other hand, general Arabic LLMs (Jais, ALLaM, Falcon-Arabic) handle Arabic beautifully but were never natively optimized for strict coding tasks.

	AraCode-7B bridges this gap. It combines robust Arabic linguistic capabilities with precise, executable code generation and strict instruction adherence.

	---

	## 📊 Comprehensive Benchmarks

	We evaluated AraCode-7B using both custom coding benchmarks and standardized frameworks (IFEval, AraGen) to compare its performance against the latest state-of-the-art Arabic and multilingual models.

	### 1. Code Generation & Understanding (Zero-Shot)
	Tested on a custom Arabic benchmark measuring raw coding capability, algorithmic logic, and debugging.

	\| Model \| Code Gen (%) \| Explain (%) \| Debug (%) \| Translate NL->Code (%) \| Total Score \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| AraCode-7B (Ours) \| 90.0% \| 92.5% \| 100.0% \| 94.0% \| 94.12% \|
	\| ALLaM-7B-Instruct \| 45.0% \| 86.2% \| 100.0% \| 90.0% \| 80.30% \|

	> Key Takeaway: AraCode-7B achieves a massive 90% in executable Code Generation. Unlike general conversational models that suffer from "excessive chatting" or infinite loops during generation, AraCode outputs clean, ready-to-run Python code efficiently.

	### 2. Instruction Following (IFEval - Arabic)
	Evaluated on strict instruction adherence (e.g., "output only code", "start with a specific word"). Competitor scores are based on published strict 0-shot IFEval (ar) benchmarks.

	\| Model \| IFEval (Arabic) (%) \|
	\|:---\|:---:\|
	\| AraCode-7B (Ours - Local Eval) \| 80.00% \|
	\| Jais-2-8B \| 37.92% \|
	\| Qwen2.5-7B-Instruct \| 33.21% \|
	\| ALLaM-7B-Instruct-preview \| 19.40% \|
	\| Llama-3.1-8B-Instruct \| 10.87% \|

	> Key Takeaway: AraCode-7B excels at instruction following. For developers, this means the model respects formatting constraints (like returning raw code without Markdown blocks) far better than general-purpose LLMs.

	### 3. Cultural Alignment & Safety (AraGen 3C3H Framework)
	Evaluated on Cultural awareness, Helpfulness, Harmlessness, Honesty, and Humility. Competitor scores are based on published AraGen 12-24 benchmarks.

	\| Model \| AraGen 3C3H Average (%) \|
	\|:---\|:---:\|
	\| Jais-2-8B \| 67.20% \|
	\| Qwen2.5-7B-Instruct \| 53.20% \|
	\| AraCode-7B (Ours - Local Eval) \| 50.00% \|
	\| Llama-3.1-8B-Instruct \| 40.65% \|

	> Key Takeaway: AraCode-7B maintains a healthy balance (50%) in safety and cultural alignment. As a domain-specific model optimized for logic and programming, it successfully avoids the "alignment tax"—ensuring that strict conversational guardrails do not degrade its primary function as a coding assistant.

	---

	## 🚀 Quickstart

	You can easily run this model locally using popular GGUF tools.

	Using llama.cpp:
	```bash
	llama-cli -hf rahimdzx/AraCode-7B-GGUF --jinja
	```

	Using Ollama:
	```bash
	ollama run hf.co/rahimdzx/AraCode-7B-GGUF
	```

	Using llama-cpp-python:
	```python
	from llama_cpp import Llama

	llm = Llama.from_pretrained(
	repo_id="rahimdzx/AraCode-7B-GGUF",
	filename="aracode-7b.Q4_K_M.gguf",
	n_gpu_layers=-1,
	n_ctx=2048
	)

	response = llm.create_chat_completion(
	messages=[
	{"role": "user", "content": "اكتب دالة بايثون للبحث الثنائي (Binary Search)."}
	]
	)
	print(response["choices"][0]["message"]["content"])
	```

	---
	---

	## 💻 Example Usage

	Input:
	> اشرح الكود التالي بالعربية:
	> ```python
	> def binary_search(arr, target):
	> lo, hi = 0, len(arr) - 1
	> while lo <= hi:
	> mid = (lo + hi) // 2
	> if arr[mid] == target:
	> return mid
	> elif arr[mid] < target:
	> lo = mid + 1
	> else:
	> hi = mid - 1
	> return -1
	> ```

	Output:
	> ### شرح الكود باللغة العربية
	>
	> #### 1. تعريف الدالة:
	> الدالة المعرفة في الكود تُدعى `binary_search`، والتي تستخدم للبحث عن عنصر معين في مصفوفة مرتبة.
	>
	> #### 2. المدخلات:
	> - `arr`: هي المصفوفة المرتبة التي نبحث فيها عن العنصر.
	> - `target`: هو العنصر الذي نريد العثور عليه في المصفوفة.
	>
	> #### 3. المخرجات:
	> الدالة تعيد الفهرس (index) للعنصر إذا تم العثور عليه، أو `-1` إذا لم يتم العثور عليه.
	>
	> #### 4. الخطوات:
	> - تثبيت الحدود: نبدأ بتعريف متغيرين `lo` و `hi`، حيث `lo` يمثل بداية المصفوفة (0) و `hi` يمثل النهاية (طول المصفوفة - 1).
	> - تكرار البحث: نستخدم حلقة `while` للتحقق من أن `lo` أقل أو يساوي `hi`.
	> - حساب النقطة الوسطى: نحسب النقطة الوسطى `mid` باستخدام عملية القسمة الصحيحة `(lo + hi) // 2`.
	> - مقارنة العنصر:
	> - إذا كان `arr[mid] == target`، نعيد الفهرس `mid`.
	> - إذا كان أقل، نبحث في النصف الأيمن (`lo = mid + 1`).
	> - إذا كان أكبر، نبحث في النصف الأيسر (`hi = mid - 1`).
	>
	> #### مثال عملي:
	> إذا كان لدينا المصفوفة `arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]` والهدف `target = 5`، الدالة ستعيد الفهرس `4`.
	>
	> #### ملخص:
	> تستخدم الدالة تقنية البحث الثنائي بكفاءة عالية وبتعقيد زمني O(log n)، مما يجعلها ممتازة للمصفوفات الكبيرة.
	Github :https://github.com/Rahimdzx/AraCode-7B
	## 📄 License
	This model is released under the Apache 2.0 license.