AraCode-7B-GGUF / README.md
rahimdzx's picture
Update README.md
6021a0c verified
---
language:
- ar
- en
tags:
- code
- arabic
- gguf
- code-explanation
- text-generation
license: apache-2.0
---
# 🐪 AraCode-7B-GGUF
**The first open-source Arabic-specialized code explanation and generation model.**
AraCode-7B understands, explains, and generates code in Arabic — a capability no existing model provides with such precision. Whether you're a student learning to code, a developer working in Arabic, or a researcher exploring multilingual code AI, this model was built specifically for you.
---
## 🌟 What makes AraCode-7B different?
Existing code models (CodeLlama, StarCoder, DeepSeek-Coder) generate excellent code but only communicate effectively in English. On the other hand, general Arabic LLMs (Jais, ALLaM, Falcon-Arabic) handle Arabic beautifully but were never natively optimized for strict coding tasks.
**AraCode-7B bridges this gap.** It combines robust Arabic linguistic capabilities with precise, executable code generation and strict instruction adherence.
---
## 📊 Comprehensive Benchmarks
We evaluated **AraCode-7B** using both custom coding benchmarks and standardized frameworks (IFEval, AraGen) to compare its performance against the latest state-of-the-art Arabic and multilingual models.
### 1. Code Generation & Understanding (Zero-Shot)
Tested on a custom Arabic benchmark measuring raw coding capability, algorithmic logic, and debugging.
| Model | Code Gen (%) | Explain (%) | Debug (%) | Translate NL->Code (%) | Total Score |
|:---|:---:|:---:|:---:|:---:|:---:|
| **AraCode-7B (Ours)** | **90.0%** | **92.5%** | **100.0%** | **94.0%** | **94.12%** |
| ALLaM-7B-Instruct | 45.0% | 86.2% | 100.0% | 90.0% | 80.30% |
> **Key Takeaway:** AraCode-7B achieves a massive **90% in executable Code Generation**. Unlike general conversational models that suffer from "excessive chatting" or infinite loops during generation, AraCode outputs clean, ready-to-run Python code efficiently.
### 2. Instruction Following (IFEval - Arabic)
Evaluated on strict instruction adherence (e.g., "output only code", "start with a specific word"). *Competitor scores are based on published strict 0-shot IFEval (ar) benchmarks.*
| Model | IFEval (Arabic) (%) |
|:---|:---:|
| **AraCode-7B (Ours - Local Eval)** | **80.00%** |
| Jais-2-8B | 37.92% |
| Qwen2.5-7B-Instruct | 33.21% |
| ALLaM-7B-Instruct-preview | 19.40% |
| Llama-3.1-8B-Instruct | 10.87% |
> **Key Takeaway:** AraCode-7B excels at instruction following. For developers, this means the model respects formatting constraints (like returning raw code without Markdown blocks) far better than general-purpose LLMs.
### 3. Cultural Alignment & Safety (AraGen 3C3H Framework)
Evaluated on Cultural awareness, Helpfulness, Harmlessness, Honesty, and Humility. *Competitor scores are based on published AraGen 12-24 benchmarks.*
| Model | AraGen 3C3H Average (%) |
|:---|:---:|
| Jais-2-8B | 67.20% |
| Qwen2.5-7B-Instruct | 53.20% |
| **AraCode-7B (Ours - Local Eval)** | **50.00%** |
| Llama-3.1-8B-Instruct | 40.65% |
> **Key Takeaway:** AraCode-7B maintains a healthy balance (50%) in safety and cultural alignment. As a domain-specific model optimized for logic and programming, it successfully avoids the "alignment tax"—ensuring that strict conversational guardrails do not degrade its primary function as a coding assistant.
---
## 🚀 Quickstart
You can easily run this model locally using popular GGUF tools.
**Using llama.cpp:**
```bash
llama-cli -hf rahimdzx/AraCode-7B-GGUF --jinja
```
**Using Ollama:**
```bash
ollama run hf.co/rahimdzx/AraCode-7B-GGUF
```
**Using llama-cpp-python:**
```python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="rahimdzx/AraCode-7B-GGUF",
filename="aracode-7b.Q4_K_M.gguf",
n_gpu_layers=-1,
n_ctx=2048
)
response = llm.create_chat_completion(
messages=[
{"role": "user", "content": "اكتب دالة بايثون للبحث الثنائي (Binary Search)."}
]
)
print(response["choices"][0]["message"]["content"])
```
---
---
## 💻 Example Usage
**Input:**
> اشرح الكود التالي بالعربية:
> ```python
> def binary_search(arr, target):
> lo, hi = 0, len(arr) - 1
> while lo <= hi:
> mid = (lo + hi) // 2
> if arr[mid] == target:
> return mid
> elif arr[mid] < target:
> lo = mid + 1
> else:
> hi = mid - 1
> return -1
> ```
**Output:**
> ### شرح الكود باللغة العربية
>
> #### 1. **تعريف الدالة**:
> الدالة المعرفة في الكود تُدعى `binary_search`، والتي تستخدم للبحث عن عنصر معين في مصفوفة مرتبة.
>
> #### 2. **المدخلات**:
> - `arr`: هي المصفوفة المرتبة التي نبحث فيها عن العنصر.
> - `target`: هو العنصر الذي نريد العثور عليه في المصفوفة.
>
> #### 3. **المخرجات**:
> الدالة تعيد الفهرس (index) للعنصر إذا تم العثور عليه، أو `-1` إذا لم يتم العثور عليه.
>
> #### 4. **الخطوات**:
> - **تثبيت الحدود**: نبدأ بتعريف متغيرين `lo` و `hi`، حيث `lo` يمثل بداية المصفوفة (0) و `hi` يمثل النهاية (طول المصفوفة - 1).
> - **تكرار البحث**: نستخدم حلقة `while` للتحقق من أن `lo` أقل أو يساوي `hi`.
> - **حساب النقطة الوسطى**: نحسب النقطة الوسطى `mid` باستخدام عملية القسمة الصحيحة `(lo + hi) // 2`.
> - **مقارنة العنصر**:
> - إذا كان `arr[mid] == target`، نعيد الفهرس `mid`.
> - إذا كان أقل، نبحث في النصف الأيمن (`lo = mid + 1`).
> - إذا كان أكبر، نبحث في النصف الأيسر (`hi = mid - 1`).
>
> #### مثال عملي:
> إذا كان لدينا المصفوفة `arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]` والهدف `target = 5`، الدالة ستعيد الفهرس `4`.
>
> #### ملخص:
> تستخدم الدالة تقنية البحث الثنائي بكفاءة عالية وبتعقيد زمني O(log n)، مما يجعلها ممتازة للمصفوفات الكبيرة.
Github : https://github.com/Rahimdzx/AraCode-7B
## 📄 License
This model is released under the **Apache 2.0** license.