--- language: - ar - en tags: - code - arabic - gguf - code-explanation - text-generation license: apache-2.0 --- # 🐪 AraCode-7B-GGUF **The first open-source Arabic-specialized code explanation and generation model.** AraCode-7B understands, explains, and generates code in Arabic — a capability no existing model provides with such precision. Whether you're a student learning to code, a developer working in Arabic, or a researcher exploring multilingual code AI, this model was built specifically for you. --- ## 🌟 What makes AraCode-7B different? Existing code models (CodeLlama, StarCoder, DeepSeek-Coder) generate excellent code but only communicate effectively in English. On the other hand, general Arabic LLMs (Jais, ALLaM, Falcon-Arabic) handle Arabic beautifully but were never natively optimized for strict coding tasks. **AraCode-7B bridges this gap.** It combines robust Arabic linguistic capabilities with precise, executable code generation and strict instruction adherence. --- ## 📊 Comprehensive Benchmarks We evaluated **AraCode-7B** using both custom coding benchmarks and standardized frameworks (IFEval, AraGen) to compare its performance against the latest state-of-the-art Arabic and multilingual models. ### 1. Code Generation & Understanding (Zero-Shot) Tested on a custom Arabic benchmark measuring raw coding capability, algorithmic logic, and debugging. | Model | Code Gen (%) | Explain (%) | Debug (%) | Translate NL->Code (%) | Total Score | |:---|:---:|:---:|:---:|:---:|:---:| | **AraCode-7B (Ours)** | **90.0%** | **92.5%** | **100.0%** | **94.0%** | **94.12%** | | ALLaM-7B-Instruct | 45.0% | 86.2% | 100.0% | 90.0% | 80.30% | > **Key Takeaway:** AraCode-7B achieves a massive **90% in executable Code Generation**. Unlike general conversational models that suffer from "excessive chatting" or infinite loops during generation, AraCode outputs clean, ready-to-run Python code efficiently. ### 2. Instruction Following (IFEval - Arabic) Evaluated on strict instruction adherence (e.g., "output only code", "start with a specific word"). *Competitor scores are based on published strict 0-shot IFEval (ar) benchmarks.* | Model | IFEval (Arabic) (%) | |:---|:---:| | **AraCode-7B (Ours - Local Eval)** | **80.00%** | | Jais-2-8B | 37.92% | | Qwen2.5-7B-Instruct | 33.21% | | ALLaM-7B-Instruct-preview | 19.40% | | Llama-3.1-8B-Instruct | 10.87% | > **Key Takeaway:** AraCode-7B excels at instruction following. For developers, this means the model respects formatting constraints (like returning raw code without Markdown blocks) far better than general-purpose LLMs. ### 3. Cultural Alignment & Safety (AraGen 3C3H Framework) Evaluated on Cultural awareness, Helpfulness, Harmlessness, Honesty, and Humility. *Competitor scores are based on published AraGen 12-24 benchmarks.* | Model | AraGen 3C3H Average (%) | |:---|:---:| | Jais-2-8B | 67.20% | | Qwen2.5-7B-Instruct | 53.20% | | **AraCode-7B (Ours - Local Eval)** | **50.00%** | | Llama-3.1-8B-Instruct | 40.65% | > **Key Takeaway:** AraCode-7B maintains a healthy balance (50%) in safety and cultural alignment. As a domain-specific model optimized for logic and programming, it successfully avoids the "alignment tax"—ensuring that strict conversational guardrails do not degrade its primary function as a coding assistant. --- ## 🚀 Quickstart You can easily run this model locally using popular GGUF tools. **Using llama.cpp:** ```bash llama-cli -hf rahimdzx/AraCode-7B-GGUF --jinja ``` **Using Ollama:** ```bash ollama run hf.co/rahimdzx/AraCode-7B-GGUF ``` **Using llama-cpp-python:** ```python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rahimdzx/AraCode-7B-GGUF", filename="aracode-7b.Q4_K_M.gguf", n_gpu_layers=-1, n_ctx=2048 ) response = llm.create_chat_completion( messages=[ {"role": "user", "content": "اكتب دالة بايثون للبحث الثنائي (Binary Search)."} ] ) print(response["choices"][0]["message"]["content"]) ``` --- --- ## 💻 Example Usage **Input:** > اشرح الكود التالي بالعربية: > ```python > def binary_search(arr, target): > lo, hi = 0, len(arr) - 1 > while lo <= hi: > mid = (lo + hi) // 2 > if arr[mid] == target: > return mid > elif arr[mid] < target: > lo = mid + 1 > else: > hi = mid - 1 > return -1 > ``` **Output:** > ### شرح الكود باللغة العربية > > #### 1. **تعريف الدالة**: > الدالة المعرفة في الكود تُدعى `binary_search`، والتي تستخدم للبحث عن عنصر معين في مصفوفة مرتبة. > > #### 2. **المدخلات**: > - `arr`: هي المصفوفة المرتبة التي نبحث فيها عن العنصر. > - `target`: هو العنصر الذي نريد العثور عليه في المصفوفة. > > #### 3. **المخرجات**: > الدالة تعيد الفهرس (index) للعنصر إذا تم العثور عليه، أو `-1` إذا لم يتم العثور عليه. > > #### 4. **الخطوات**: > - **تثبيت الحدود**: نبدأ بتعريف متغيرين `lo` و `hi`، حيث `lo` يمثل بداية المصفوفة (0) و `hi` يمثل النهاية (طول المصفوفة - 1). > - **تكرار البحث**: نستخدم حلقة `while` للتحقق من أن `lo` أقل أو يساوي `hi`. > - **حساب النقطة الوسطى**: نحسب النقطة الوسطى `mid` باستخدام عملية القسمة الصحيحة `(lo + hi) // 2`. > - **مقارنة العنصر**: > - إذا كان `arr[mid] == target`، نعيد الفهرس `mid`. > - إذا كان أقل، نبحث في النصف الأيمن (`lo = mid + 1`). > - إذا كان أكبر، نبحث في النصف الأيسر (`hi = mid - 1`). > > #### مثال عملي: > إذا كان لدينا المصفوفة `arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]` والهدف `target = 5`، الدالة ستعيد الفهرس `4`. > > #### ملخص: > تستخدم الدالة تقنية البحث الثنائي بكفاءة عالية وبتعقيد زمني O(log n)، مما يجعلها ممتازة للمصفوفات الكبيرة. Github : https://github.com/Rahimdzx/AraCode-7B ## 📄 License This model is released under the **Apache 2.0** license.