| --- |
| language: |
| - ar |
| - en |
| tags: |
| - code |
| - arabic |
| - gguf |
| - code-explanation |
| - text-generation |
| license: apache-2.0 |
| --- |
| |
| # 🐪 AraCode-7B-GGUF |
|
|
| **The first open-source Arabic-specialized code explanation and generation model.** |
|
|
| AraCode-7B understands, explains, and generates code in Arabic — a capability no existing model provides with such precision. Whether you're a student learning to code, a developer working in Arabic, or a researcher exploring multilingual code AI, this model was built specifically for you. |
|
|
| --- |
|
|
| ## 🌟 What makes AraCode-7B different? |
|
|
| Existing code models (CodeLlama, StarCoder, DeepSeek-Coder) generate excellent code but only communicate effectively in English. On the other hand, general Arabic LLMs (Jais, ALLaM, Falcon-Arabic) handle Arabic beautifully but were never natively optimized for strict coding tasks. |
|
|
| **AraCode-7B bridges this gap.** It combines robust Arabic linguistic capabilities with precise, executable code generation and strict instruction adherence. |
|
|
| --- |
|
|
| ## 📊 Comprehensive Benchmarks |
|
|
| We evaluated **AraCode-7B** using both custom coding benchmarks and standardized frameworks (IFEval, AraGen) to compare its performance against the latest state-of-the-art Arabic and multilingual models. |
|
|
| ### 1. Code Generation & Understanding (Zero-Shot) |
| Tested on a custom Arabic benchmark measuring raw coding capability, algorithmic logic, and debugging. |
|
|
| | Model | Code Gen (%) | Explain (%) | Debug (%) | Translate NL->Code (%) | Total Score | |
| |:---|:---:|:---:|:---:|:---:|:---:| |
| | **AraCode-7B (Ours)** | **90.0%** | **92.5%** | **100.0%** | **94.0%** | **94.12%** | |
| | ALLaM-7B-Instruct | 45.0% | 86.2% | 100.0% | 90.0% | 80.30% | |
|
|
| > **Key Takeaway:** AraCode-7B achieves a massive **90% in executable Code Generation**. Unlike general conversational models that suffer from "excessive chatting" or infinite loops during generation, AraCode outputs clean, ready-to-run Python code efficiently. |
|
|
| ### 2. Instruction Following (IFEval - Arabic) |
| Evaluated on strict instruction adherence (e.g., "output only code", "start with a specific word"). *Competitor scores are based on published strict 0-shot IFEval (ar) benchmarks.* |
|
|
| | Model | IFEval (Arabic) (%) | |
| |:---|:---:| |
| | **AraCode-7B (Ours - Local Eval)** | **80.00%** | |
| | Jais-2-8B | 37.92% | |
| | Qwen2.5-7B-Instruct | 33.21% | |
| | ALLaM-7B-Instruct-preview | 19.40% | |
| | Llama-3.1-8B-Instruct | 10.87% | |
|
|
| > **Key Takeaway:** AraCode-7B excels at instruction following. For developers, this means the model respects formatting constraints (like returning raw code without Markdown blocks) far better than general-purpose LLMs. |
|
|
| ### 3. Cultural Alignment & Safety (AraGen 3C3H Framework) |
| Evaluated on Cultural awareness, Helpfulness, Harmlessness, Honesty, and Humility. *Competitor scores are based on published AraGen 12-24 benchmarks.* |
|
|
| | Model | AraGen 3C3H Average (%) | |
| |:---|:---:| |
| | Jais-2-8B | 67.20% | |
| | Qwen2.5-7B-Instruct | 53.20% | |
| | **AraCode-7B (Ours - Local Eval)** | **50.00%** | |
| | Llama-3.1-8B-Instruct | 40.65% | |
|
|
| > **Key Takeaway:** AraCode-7B maintains a healthy balance (50%) in safety and cultural alignment. As a domain-specific model optimized for logic and programming, it successfully avoids the "alignment tax"—ensuring that strict conversational guardrails do not degrade its primary function as a coding assistant. |
|
|
| --- |
|
|
| ## 🚀 Quickstart |
|
|
| You can easily run this model locally using popular GGUF tools. |
|
|
| **Using llama.cpp:** |
| ```bash |
| llama-cli -hf rahimdzx/AraCode-7B-GGUF --jinja |
| ``` |
|
|
| **Using Ollama:** |
| ```bash |
| ollama run hf.co/rahimdzx/AraCode-7B-GGUF |
| ``` |
|
|
| **Using llama-cpp-python:** |
| ```python |
| from llama_cpp import Llama |
| |
| llm = Llama.from_pretrained( |
| repo_id="rahimdzx/AraCode-7B-GGUF", |
| filename="aracode-7b.Q4_K_M.gguf", |
| n_gpu_layers=-1, |
| n_ctx=2048 |
| ) |
| |
| response = llm.create_chat_completion( |
| messages=[ |
| {"role": "user", "content": "اكتب دالة بايثون للبحث الثنائي (Binary Search)."} |
| ] |
| ) |
| print(response["choices"][0]["message"]["content"]) |
| ``` |
|
|
| --- |
| --- |
|
|
| ## 💻 Example Usage |
|
|
| **Input:** |
| > اشرح الكود التالي بالعربية: |
| > ```python |
| > def binary_search(arr, target): |
| > lo, hi = 0, len(arr) - 1 |
| > while lo <= hi: |
| > mid = (lo + hi) // 2 |
| > if arr[mid] == target: |
| > return mid |
| > elif arr[mid] < target: |
| > lo = mid + 1 |
| > else: |
| > hi = mid - 1 |
| > return -1 |
| > ``` |
| |
| **Output:** |
| > ### شرح الكود باللغة العربية |
| > |
| > #### 1. **تعريف الدالة**: |
| > الدالة المعرفة في الكود تُدعى `binary_search`، والتي تستخدم للبحث عن عنصر معين في مصفوفة مرتبة. |
| > |
| > #### 2. **المدخلات**: |
| > - `arr`: هي المصفوفة المرتبة التي نبحث فيها عن العنصر. |
| > - `target`: هو العنصر الذي نريد العثور عليه في المصفوفة. |
| > |
| > #### 3. **المخرجات**: |
| > الدالة تعيد الفهرس (index) للعنصر إذا تم العثور عليه، أو `-1` إذا لم يتم العثور عليه. |
| > |
| > #### 4. **الخطوات**: |
| > - **تثبيت الحدود**: نبدأ بتعريف متغيرين `lo` و `hi`، حيث `lo` يمثل بداية المصفوفة (0) و `hi` يمثل النهاية (طول المصفوفة - 1). |
| > - **تكرار البحث**: نستخدم حلقة `while` للتحقق من أن `lo` أقل أو يساوي `hi`. |
| > - **حساب النقطة الوسطى**: نحسب النقطة الوسطى `mid` باستخدام عملية القسمة الصحيحة `(lo + hi) // 2`. |
| > - **مقارنة العنصر**: |
| > - إذا كان `arr[mid] == target`، نعيد الفهرس `mid`. |
| > - إذا كان أقل، نبحث في النصف الأيمن (`lo = mid + 1`). |
| > - إذا كان أكبر، نبحث في النصف الأيسر (`hi = mid - 1`). |
| > |
| > #### مثال عملي: |
| > إذا كان لدينا المصفوفة `arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]` والهدف `target = 5`، الدالة ستعيد الفهرس `4`. |
| > |
| > #### ملخص: |
| > تستخدم الدالة تقنية البحث الثنائي بكفاءة عالية وبتعقيد زمني O(log n)، مما يجعلها ممتازة للمصفوفات الكبيرة. |
| Github :https://github.com/Rahimdzx/AraCode-7B |
| ## 📄 License |
| This model is released under the **Apache 2.0** license. |