Hakim AI: Arabic Clinical LLM
Hakim AI is an open-source Arabic language model fine-tuned specifically to read medical text and answer patient questions accurately. It is based on the Cohere Command R7B architecture and has been heavily optimized to act as the brain of a Retrieval-Augmented Generation (RAG) system.
By compressing the model with AWQ 8-bit quantization, it runs efficiently on more affordable consumer GPUs (like an NVIDIA T4 or A10G) without losing its clinical accuracy or its ability to understand messy, scanned textbook text.
Model Details
- Base Model: c4ai-command-r7b-arabic-02-2025
- Language: Arabic (ar) / English (en)
- Domain: Medical / Healthcare
- Quantization: AWQ 8-bit (W8A16)
- Serving: Fully compatible with vLLM
- License: Free and Open-Source
What it does best
This model is built to sit inside a Medical RAG pipeline. Instead of relying on its own internal memory to answer medical questions, it is trained to read context (like retrieved pages from medical textbooks) and generate an answer based purely on that text.
It is specifically prompted and trained to always cite its sources. For example, it will format its answers like: "بناءً على [اسم المصدر]، صفحة [رقم الصفحة]". It is also programmed to gracefully refuse questions that are outside the medical scope.
How to run it
Since Hakim AI is quantized with AWQ, the fastest way to serve it is by using vLLM. Here is a quick example of how to load it up and structure a prompt.
from vllm import LLM, SamplingParams
# Load the model with vLLM
# Setting gpu_memory_utilization to 0.6 leaves room on the GPU for your embedding model
llm = LLM(
model="Shams03/Hakim-AI",
quantization="awq",
gpu_memory_utilization=0.6,
max_model_len=4096
)
# Example RAG Prompt Structure
prompt = """
استخدم المعلومات الطبية التالية للإجابة على سؤال المريض. اذكر اسم المصدر ورقم الصفحة.
المصدر: الدليل العلاجي في الطب الباطني، صفحة 210
النص: حصوات الكلى تسبب ألماً شديداً يبدأ في الظهر ويمتد إلى أسفل الحوض...
السؤال: عندي وجع فظيع في ضهري من ورا ونازل على الحوض، أعمل إيه؟
الإجابة:
"""
sampling_params = SamplingParams(temperature=0.3, top_p=0.9, max_tokens=512)
outputs = llm.generate([prompt], sampling_params)
print(outputs[0].outputs[0].text)
How it was trained
We started with the Cohere Command R7B model and fine-tuned it on 50,000 real patient-doctor chats. Before training, we ran the raw data through the Gemini API to clean it up and ensure the medical Arabic sounded natural and highly accurate. You can check out the dataset here:Shams03/Ara-Egy-Medical-QA.
For the fine-tuning process, we used PEFT/LoRA (Rank=16, Alpha=32, Dropout=0.07) targeting the attention and MLP projections.
Handling messy OCR text
If you have ever tried to extract text from Arabic PDFs, you know the OCR can get incredibly messy. To prevent the model from hallucinating or breaking when reading bad text, we did something unique during the AWQ quantization phase.
Instead of using standard calibration text, we built a custom calibration dataset using 64 highly challenging search examples. We intentionally included simulated OCR noise, inverted Arabic words, and disjointed phrasing. As a result, the compressed 8-bit model is highly resilient and handles standard pytesseract OCR artifacts beautifully.
Real-world performance
When plugged into our local Hybrid RAG pipeline (using Qdrant and BAAI/bge-m3), the model performs really well under load:
- It easily handles 150 concurrent users at 25+ requests per second on a single
g4dn.2xlargeinstance. - Most responses are generated in under 7 seconds, with the absolute worst-case scenario taking about 12 seconds.
- Compared to the baseline model, ROUGE-L scores jumped by 12%, and our MLflow tracking showed hallucination rates dropped below 3%.
Medical Disclaimer
Hakim AI is an AI assistant and a research tool, not a doctor. It should never be used as a replacement for professional medical diagnosis, advice, or treatment.
- Downloads last month
- 125
Model tree for Shams03/Hakim-AI
Base model
CohereLabs/c4ai-command-r7b-arabic-02-2025