YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

MAQA-LLaMA-4bit — Arabic Medical Q&A (4-bit GPU Inference)

⚠️ Disclaimer: This model is intended for research and informational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. It cannot and should not be used to prescribe or recommend medications.


Model Summary

maqa_llama_4bit is the 4-bit GPU inference variant of the MAQA-LLaMA family — a Llama 3 8B model fine-tuned on 430,000 real Arabic doctor-patient interactions across 20 medical specialisations. Runs on consumer-grade GPUs with 6–8 GB VRAM.

Property Value
Base model unsloth/llama-3-8b-Instruct-bnb-4bit (Meta Llama 3 8B Instruct)
Fine-tuning method QLoRA (via Unsloth)
Quantisation 4-bit (bitsandbytes, merged_4bit_forced)
Tensor types F16 / F32 / U8
Model size 8B parameters
Language Arabic 🇸🇦
License Apache 2.0
Developed by Ali Abdelrasheed

Model Family

Model Format Size Best for
maqa_llama BF16 SafeTensors ~16 GB Research / further fine-tuning
maqa_llama_4bitthis model 4-bit bitsandbytes ~5 GB ✅ GPU inference
maqa_llama_4bit_GGUF GGUF q4_k_m 4.92 GB CPU / local deployment

Quick Start

With Unsloth (recommended — 2x faster inference)

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "AliAbdelrasheed/maqa_llama_4bit",
    max_seq_length = 2048,
    dtype = None,         # Auto-detect: Float16 for T4/V100, BFloat16 for Ampere+
    load_in_4bit = True,
)

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3",
    mapping = {"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
)

FastLanguageModel.for_inference(model)

messages = [
    {
        "from": "system",
        "value": "أنت طبيب محترف ولديك خبرة في كل مجالات الطب. يجيب على أسئلة المرضى حول الأمراض، باستخدام لهجة رسمية وودية، وإجابات موجزة ومفيدة يسهل على الجميع فهمها."
    },
    {
        "from": "human",
        "value": "أشعر بألم أسفل البطن بخاصرتي والألم يجي على فترات، ما الأسباب؟"
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True,
    return_tensors = "pt",
).to("cuda")

streamer = TextStreamer(tokenizer)
_ = model.generate(
    input_ids = inputs,
    streamer = streamer,
    max_new_tokens = 256,
    use_cache = True,
)

With Transformers + bitsandbytes (no Unsloth required)

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("AliAbdelrasheed/maqa_llama_4bit")
model = AutoModelForCausalLM.from_pretrained(
    "AliAbdelrasheed/maqa_llama_4bit",
    load_in_4bit = True,
    device_map = "auto",
)

prompt = "ما هي أعراض مرض السكري من النوع الثاني وكيف يمكن التعامل معه؟"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Dataset — MAQA

Property Value
Total records 430,000 question-answer pairs
Columns Patient question · Doctor diagnosis · Doctor treatment notes
Sources altibbi.com · tbeeb.net · cura.healthcare
Specialisations 20 medical fields
Language Modern Standard Arabic
Training split 70% train / 30% evaluation

Dataset: "Deep learning for Arabic healthcare: MedicalBot" — Springer (2023) Harvard Dataverse


Training Details

LoRA Configuration

Hyperparameter Value
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Gradient checkpointing unsloth

Training Arguments

Hyperparameter Value
Epochs 1
Batch size 52
Learning rate 2e-4
LR scheduler Linear
Warmup steps 200
Optimiser AdamW 8-bit
Max sequence length 2048
Training environment Google Colab Pro

Full training details and notebook available on GitHub.


Limitations

  • Not a substitute for professional medical advice or clinical diagnosis
  • Cannot prescribe or recommend medications
  • Trained on a sampled subset of MAQA; performance may vary across all 20 specialisations
  • Optimised for Modern Standard Arabic; dialectal Arabic performance may vary
  • Web-scraped data may contain noise or outdated medical information

Developed By

Ali Abdelrasheed — Graduation Project Nile University · B.Sc. Information Technology – Big Data · Class of 2024 🤗 HuggingFace Profile

Downloads last month
13
Safetensors
Model size
8B params
Tensor type
F16
·
F32
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support