File size: 6,424 Bytes
302c31b
 
e78b7f8
302c31b
e78b7f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
302c31b
 
e78b7f8
302c31b
e78b7f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
302c31b
e78b7f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2440523
e78b7f8
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
language:
- ar
- en
tags:
- code
- arabic
- gguf
- code-explanation
- text-generation
license: apache-2.0
---

# 🐪 AraCode-7B-GGUF

**The first open-source Arabic-specialized code explanation and generation model.**

AraCode-7B understands, explains, and generates code in Arabic — a capability no existing model provides with such precision. Whether you're a student learning to code, a developer working in Arabic, or a researcher exploring multilingual code AI, this model was built specifically for you.

---

## 🌟 What makes AraCode-7B different?

Existing code models (CodeLlama, StarCoder, DeepSeek-Coder) generate excellent code but only communicate effectively in English. On the other hand, general Arabic LLMs (Jais, ALLaM, Falcon-Arabic) handle Arabic beautifully but were never natively optimized for strict coding tasks. 

**AraCode-7B bridges this gap.** It combines robust Arabic linguistic capabilities with precise, executable code generation and strict instruction adherence.

---

## 📊 Comprehensive Benchmarks

We evaluated **AraCode-7B** using both custom coding benchmarks and standardized frameworks (IFEval, AraGen) to compare its performance against the latest state-of-the-art Arabic and multilingual models.

### 1. Code Generation & Understanding (Zero-Shot)
Tested on a custom Arabic benchmark measuring raw coding capability, algorithmic logic, and debugging.

| Model | Code Gen (%) | Explain (%) | Debug (%) | Translate NL->Code (%) | Total Score |
|:---|:---:|:---:|:---:|:---:|:---:|
| **AraCode-7B (Ours)** | **90.0%** | **92.5%** | **100.0%** | **94.0%** | **94.12%** |
| ALLaM-7B-Instruct | 45.0% | 86.2% | 100.0% | 90.0% | 80.30% |

> **Key Takeaway:** AraCode-7B achieves a massive **90% in executable Code Generation**. Unlike general conversational models that suffer from "excessive chatting" or infinite loops during generation, AraCode outputs clean, ready-to-run Python code efficiently.

### 2. Instruction Following (IFEval - Arabic)
Evaluated on strict instruction adherence (e.g., "output only code", "start with a specific word"). *Competitor scores are based on published strict 0-shot IFEval (ar) benchmarks.*

| Model | IFEval (Arabic) (%) |
|:---|:---:|
| **AraCode-7B (Ours - Local Eval)** | **80.00%** |
| Jais-2-8B | 37.92% |
| Qwen2.5-7B-Instruct | 33.21% |
| ALLaM-7B-Instruct-preview | 19.40% |
| Llama-3.1-8B-Instruct | 10.87% |

> **Key Takeaway:** AraCode-7B excels at instruction following. For developers, this means the model respects formatting constraints (like returning raw code without Markdown blocks) far better than general-purpose LLMs.

### 3. Cultural Alignment & Safety (AraGen 3C3H Framework)
Evaluated on Cultural awareness, Helpfulness, Harmlessness, Honesty, and Humility. *Competitor scores are based on published AraGen 12-24 benchmarks.*

| Model | AraGen 3C3H Average (%) |
|:---|:---:|
| Jais-2-8B | 67.20% |
| Qwen2.5-7B-Instruct | 53.20% |
| **AraCode-7B (Ours - Local Eval)** | **50.00%** |
| Llama-3.1-8B-Instruct | 40.65% |

> **Key Takeaway:** AraCode-7B maintains a healthy balance (50%) in safety and cultural alignment. As a domain-specific model optimized for logic and programming, it successfully avoids the "alignment tax"—ensuring that strict conversational guardrails do not degrade its primary function as a coding assistant.

---

## 🚀 Quickstart

You can easily run this model locally using popular GGUF tools.

**Using llama.cpp:**
```bash
llama-cli -hf rahimdzx/AraCode-7B-GGUF --jinja
```

**Using Ollama:**
```bash
ollama run hf.co/rahimdzx/AraCode-7B-GGUF
```

**Using llama-cpp-python:**
```python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="rahimdzx/AraCode-7B-GGUF",
    filename="aracode-7b.Q4_K_M.gguf",
    n_gpu_layers=-1,
    n_ctx=2048
)

response = llm.create_chat_completion(
    messages=[
        {"role": "user", "content": "اكتب دالة بايثون للبحث الثنائي (Binary Search)."}
    ]
)
print(response["choices"][0]["message"]["content"])
```

---
---

## 💻 Example Usage

**Input:**
> اشرح الكود التالي بالعربية:
> ```python
> def binary_search(arr, target):
>     lo, hi = 0, len(arr) - 1
>     while lo <= hi:
>         mid = (lo + hi) // 2
>         if arr[mid] == target:
>             return mid
>         elif arr[mid] < target:
>             lo = mid + 1
>         else:
>             hi = mid - 1
>     return -1
> ```

**Output:**
> ### شرح الكود باللغة العربية
> 
> #### 1. **تعريف الدالة**:
> الدالة المعرفة في الكود تُدعى `binary_search`، والتي تستخدم للبحث عن عنصر معين في مصفوفة مرتبة.
> 
> #### 2. **المدخلات**:
> - `arr`: هي المصفوفة المرتبة التي نبحث فيها عن العنصر.
> - `target`: هو العنصر الذي نريد العثور عليه في المصفوفة.
> 
> #### 3. **المخرجات**:
> الدالة تعيد الفهرس (index) للعنصر إذا تم العثور عليه، أو `-1` إذا لم يتم العثور عليه.
> 
> #### 4. **الخطوات**:
> - **تثبيت الحدود**: نبدأ بتعريف متغيرين `lo` و `hi`، حيث `lo` يمثل بداية المصفوفة (0) و `hi` يمثل النهاية (طول المصفوفة - 1).
> - **تكرار البحث**: نستخدم حلقة `while` للتحقق من أن `lo` أقل أو يساوي `hi`.
> - **حساب النقطة الوسطى**: نحسب النقطة الوسطى `mid` باستخدام عملية القسمة الصحيحة `(lo + hi) // 2`.
> - **مقارنة العنصر**: 
>   - إذا كان `arr[mid] == target`، نعيد الفهرس `mid`.
>   - إذا كان أقل، نبحث في النصف الأيمن (`lo = mid + 1`).
>   - إذا كان أكبر، نبحث في النصف الأيسر (`hi = mid - 1`).
> 
> #### مثال عملي:
> إذا كان لدينا المصفوفة `arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]` والهدف `target = 5`، الدالة ستعيد الفهرس `4`.
> 
> #### ملخص:
> تستخدم الدالة تقنية البحث الثنائي بكفاءة عالية وبتعقيد زمني O(log n)، مما يجعلها ممتازة للمصفوفات الكبيرة.
Github :https://github.com/Rahimdzx/AraCode-7B
## 📄 License
This model is released under the **Apache 2.0** license.