File size: 6,424 Bytes
302c31b e78b7f8 302c31b e78b7f8 302c31b e78b7f8 302c31b e78b7f8 302c31b e78b7f8 2440523 e78b7f8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | ---
language:
- ar
- en
tags:
- code
- arabic
- gguf
- code-explanation
- text-generation
license: apache-2.0
---
# 🐪 AraCode-7B-GGUF
**The first open-source Arabic-specialized code explanation and generation model.**
AraCode-7B understands, explains, and generates code in Arabic — a capability no existing model provides with such precision. Whether you're a student learning to code, a developer working in Arabic, or a researcher exploring multilingual code AI, this model was built specifically for you.
---
## 🌟 What makes AraCode-7B different?
Existing code models (CodeLlama, StarCoder, DeepSeek-Coder) generate excellent code but only communicate effectively in English. On the other hand, general Arabic LLMs (Jais, ALLaM, Falcon-Arabic) handle Arabic beautifully but were never natively optimized for strict coding tasks.
**AraCode-7B bridges this gap.** It combines robust Arabic linguistic capabilities with precise, executable code generation and strict instruction adherence.
---
## 📊 Comprehensive Benchmarks
We evaluated **AraCode-7B** using both custom coding benchmarks and standardized frameworks (IFEval, AraGen) to compare its performance against the latest state-of-the-art Arabic and multilingual models.
### 1. Code Generation & Understanding (Zero-Shot)
Tested on a custom Arabic benchmark measuring raw coding capability, algorithmic logic, and debugging.
| Model | Code Gen (%) | Explain (%) | Debug (%) | Translate NL->Code (%) | Total Score |
|:---|:---:|:---:|:---:|:---:|:---:|
| **AraCode-7B (Ours)** | **90.0%** | **92.5%** | **100.0%** | **94.0%** | **94.12%** |
| ALLaM-7B-Instruct | 45.0% | 86.2% | 100.0% | 90.0% | 80.30% |
> **Key Takeaway:** AraCode-7B achieves a massive **90% in executable Code Generation**. Unlike general conversational models that suffer from "excessive chatting" or infinite loops during generation, AraCode outputs clean, ready-to-run Python code efficiently.
### 2. Instruction Following (IFEval - Arabic)
Evaluated on strict instruction adherence (e.g., "output only code", "start with a specific word"). *Competitor scores are based on published strict 0-shot IFEval (ar) benchmarks.*
| Model | IFEval (Arabic) (%) |
|:---|:---:|
| **AraCode-7B (Ours - Local Eval)** | **80.00%** |
| Jais-2-8B | 37.92% |
| Qwen2.5-7B-Instruct | 33.21% |
| ALLaM-7B-Instruct-preview | 19.40% |
| Llama-3.1-8B-Instruct | 10.87% |
> **Key Takeaway:** AraCode-7B excels at instruction following. For developers, this means the model respects formatting constraints (like returning raw code without Markdown blocks) far better than general-purpose LLMs.
### 3. Cultural Alignment & Safety (AraGen 3C3H Framework)
Evaluated on Cultural awareness, Helpfulness, Harmlessness, Honesty, and Humility. *Competitor scores are based on published AraGen 12-24 benchmarks.*
| Model | AraGen 3C3H Average (%) |
|:---|:---:|
| Jais-2-8B | 67.20% |
| Qwen2.5-7B-Instruct | 53.20% |
| **AraCode-7B (Ours - Local Eval)** | **50.00%** |
| Llama-3.1-8B-Instruct | 40.65% |
> **Key Takeaway:** AraCode-7B maintains a healthy balance (50%) in safety and cultural alignment. As a domain-specific model optimized for logic and programming, it successfully avoids the "alignment tax"—ensuring that strict conversational guardrails do not degrade its primary function as a coding assistant.
---
## 🚀 Quickstart
You can easily run this model locally using popular GGUF tools.
**Using llama.cpp:**
```bash
llama-cli -hf rahimdzx/AraCode-7B-GGUF --jinja
```
**Using Ollama:**
```bash
ollama run hf.co/rahimdzx/AraCode-7B-GGUF
```
**Using llama-cpp-python:**
```python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="rahimdzx/AraCode-7B-GGUF",
filename="aracode-7b.Q4_K_M.gguf",
n_gpu_layers=-1,
n_ctx=2048
)
response = llm.create_chat_completion(
messages=[
{"role": "user", "content": "اكتب دالة بايثون للبحث الثنائي (Binary Search)."}
]
)
print(response["choices"][0]["message"]["content"])
```
---
---
## 💻 Example Usage
**Input:**
> اشرح الكود التالي بالعربية:
> ```python
> def binary_search(arr, target):
> lo, hi = 0, len(arr) - 1
> while lo <= hi:
> mid = (lo + hi) // 2
> if arr[mid] == target:
> return mid
> elif arr[mid] < target:
> lo = mid + 1
> else:
> hi = mid - 1
> return -1
> ```
**Output:**
> ### شرح الكود باللغة العربية
>
> #### 1. **تعريف الدالة**:
> الدالة المعرفة في الكود تُدعى `binary_search`، والتي تستخدم للبحث عن عنصر معين في مصفوفة مرتبة.
>
> #### 2. **المدخلات**:
> - `arr`: هي المصفوفة المرتبة التي نبحث فيها عن العنصر.
> - `target`: هو العنصر الذي نريد العثور عليه في المصفوفة.
>
> #### 3. **المخرجات**:
> الدالة تعيد الفهرس (index) للعنصر إذا تم العثور عليه، أو `-1` إذا لم يتم العثور عليه.
>
> #### 4. **الخطوات**:
> - **تثبيت الحدود**: نبدأ بتعريف متغيرين `lo` و `hi`، حيث `lo` يمثل بداية المصفوفة (0) و `hi` يمثل النهاية (طول المصفوفة - 1).
> - **تكرار البحث**: نستخدم حلقة `while` للتحقق من أن `lo` أقل أو يساوي `hi`.
> - **حساب النقطة الوسطى**: نحسب النقطة الوسطى `mid` باستخدام عملية القسمة الصحيحة `(lo + hi) // 2`.
> - **مقارنة العنصر**:
> - إذا كان `arr[mid] == target`، نعيد الفهرس `mid`.
> - إذا كان أقل، نبحث في النصف الأيمن (`lo = mid + 1`).
> - إذا كان أكبر، نبحث في النصف الأيسر (`hi = mid - 1`).
>
> #### مثال عملي:
> إذا كان لدينا المصفوفة `arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]` والهدف `target = 5`، الدالة ستعيد الفهرس `4`.
>
> #### ملخص:
> تستخدم الدالة تقنية البحث الثنائي بكفاءة عالية وبتعقيد زمني O(log n)، مما يجعلها ممتازة للمصفوفات الكبيرة.
Github :https://github.com/Rahimdzx/AraCode-7B
## 📄 License
This model is released under the **Apache 2.0** license. |