Instructions to use AliAbdelrasheed/maqa_llama with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AliAbdelrasheed/maqa_llama with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AliAbdelrasheed/maqa_llama") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("AliAbdelrasheed/maqa_llama") model = AutoModelForMultimodalLM.from_pretrained("AliAbdelrasheed/maqa_llama") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AliAbdelrasheed/maqa_llama with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AliAbdelrasheed/maqa_llama" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AliAbdelrasheed/maqa_llama", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AliAbdelrasheed/maqa_llama
- SGLang
How to use AliAbdelrasheed/maqa_llama with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AliAbdelrasheed/maqa_llama" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AliAbdelrasheed/maqa_llama", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AliAbdelrasheed/maqa_llama" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AliAbdelrasheed/maqa_llama", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use AliAbdelrasheed/maqa_llama with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AliAbdelrasheed/maqa_llama to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AliAbdelrasheed/maqa_llama to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AliAbdelrasheed/maqa_llama to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AliAbdelrasheed/maqa_llama", max_seq_length=2048, ) - Docker Model Runner
How to use AliAbdelrasheed/maqa_llama with Docker Model Runner:
docker model run hf.co/AliAbdelrasheed/maqa_llama
MAQA-LLaMA — Arabic Medical Question Answering (Base Model)
⚠️ Disclaimer: This model is intended for research and informational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. It cannot and should not be used to prescribe or recommend medications.
Model Summary
maqa_llama is a Meta Llama 3 8B Instruct model fine-tuned on the MAQA dataset
(Medical Arabic Questions & Answers) — 430,000 real doctor-patient interactions across
20 medical specialisations, sourced from Arabic medical platforms.
This is the full-precision base model (BF16 / 16-bit), best suited for research and further fine-tuning. For deployment on consumer hardware, see the quantised variants below.
| Property | Value |
|---|---|
| Base model | unsloth/llama-3-8b-Instruct-bnb-4bit (Meta Llama 3 8B Instruct) |
| Fine-tuning method | QLoRA (via Unsloth) |
| Precision | BF16 (merged 16-bit) |
| Model size | 8B parameters |
| Language | Arabic 🇸🇦 |
| License | Apache 2.0 |
| Developed by | Ali Abdelrasheed |
Model Family
| Model | Format | Size | Best for |
|---|---|---|---|
maqa_llama ← this model |
BF16 SafeTensors | ~16 GB | Research / further fine-tuning |
maqa_llama_4bit |
4-bit (bitsandbytes) | ~5 GB | GPU inference |
maqa_llama_4bit_GGUF |
GGUF q4_k_m | 4.92 GB | CPU / local deployment |
Dataset — MAQA
The model was trained on MAQA (Medical Arabic Questions & Answers), the largest Arabic medical Q&A dataset available for NLP research.
| Property | Value |
|---|---|
| Total records | 430,000 question-answer pairs |
| Columns | question (patient) · answer (doctor diagnosis + treatment notes) |
| Medical specialisations | 20 (cardiology, dermatology, neurology, gastroenterology, paediatrics, and more) |
| Sources | altibbi.com · tbeeb.net · cura.healthcare |
| Language | Arabic (Modern Standard Arabic) |
| Quality | All questions unique and cleaned (not stemmed) |
| Training split | 70% train / 30% evaluation |
Dataset reference: "Deep learning for Arabic healthcare: MedicalBot" — Social Network Analysis and Mining, Springer (2023) Available on Harvard Dataverse
Training Details
LoRA Configuration
| Hyperparameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0 (optimised) |
| Bias | none (optimised) |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Gradient checkpointing | unsloth (30% less VRAM) |
| Random state | 3407 |
Training Arguments
| Hyperparameter | Value |
|---|---|
| Epochs | 1 |
| Batch size (per device) | 52 |
| Gradient accumulation steps | 1 |
| Learning rate | 2e-4 |
| LR scheduler | Linear |
| Warmup steps | 200 |
| Optimiser | AdamW 8-bit |
| Max sequence length | 2048 |
| Evaluation strategy | Every 500 steps |
| Training environment | Google Colab Pro |
| Framework | Unsloth + HuggingFace TRL (SFTTrainer) |
Chat Template & System Prompt
The model was fine-tuned using the Llama-3 chat template with a custom Arabic medical system prompt:
أنت طبيب محترف ولديك خبرة في كل مجالات الطب.
يجيب على أسئلة المرضى حول الأمراض، باستخدام لهجة رسمية وودية،
وإجابات موجزة ومفيدة يسهل على الجميع فهمها.
"You are a professional doctor with expertise in all fields of medicine. Answer patients' questions about diseases using a formal and friendly tone, with concise and helpful answers that everyone can understand."
Special tokens <|question|> and <|answer|> were added to the tokenizer
to clearly demarcate patient input and doctor response during training.
Training Journey
The development of this model went through multiple iterations:
RAG approach (initial attempt): Explored Retrieval-Augmented Generation using LangChain and FAISS as a vector store. While technically functional, this approach had limitations in generalisation and was not suitable for a fine-tuned conversational model.
Manual LoRA fine-tuning: Moved to direct supervised fine-tuning using LoRA for full control over the model weights on the MAQA dataset.
Unsloth optimisation (final): Adopted the Unsloth framework to maximise training efficiency on Colab Pro's GPU resources — achieving 2x faster training with significantly reduced VRAM usage. This produced the current published models.
Quick Start
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "AliAbdelrasheed/maqa_llama",
max_seq_length = 2048,
dtype = None,
load_in_4bit = False, # Full precision
)
tokenizer = get_chat_template(tokenizer, chat_template="llama-3")
FastLanguageModel.for_inference(model)
messages = [
{"from": "system", "value": "أنت طبيب محترف ولديك خبرة في كل مجالات الطب. يجيب على أسئلة المرضى حول الأمراض، باستخدام لهجة رسمية وودية، وإجابات موجزة ومفيدة يسهل على الجميع فهمها."},
{"from": "human", "value": "ما هي أعراض مرض السكري من النوع الثاني؟"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
streamer = TextStreamer(tokenizer)
_ = model.generate(input_ids=inputs, streamer=streamer, max_new_tokens=256, use_cache=True)
Limitations
- Not a substitute for professional medical advice or clinical diagnosis
- Cannot prescribe, recommend, or endorse specific medications
- Trained on a sampled subset of MAQA — performance may vary across all 20 specialisations
- Optimised for Modern Standard Arabic; dialectal Arabic (Egyptian, Levantine, etc.) performance may vary
- Web-scraped training data may contain noise or outdated medical information
- Scarcity of high-quality Arabic medical data remains an open challenge in the field
Future Work
- Fine-tuning on the full 430,000-row MAQA dataset (current model trained on a sampled subset)
- Multi-turn conversational memory for sustained patient-doctor dialogue
- Dialect-specific fine-tuning (Egyptian Arabic, Gulf Arabic)
- Multimodal input support (e.g. dermatology image input)
- Integration with electronic medical records (EMR) systems
- Real-time knowledge updates from medical literature
Citation
If you use this model in your research, please cite the MAQA dataset:
@article{maqa2023,
title={Deep learning for Arabic healthcare: MedicalBot},
journal={Social Network Analysis and Mining},
publisher={Springer},
year={2023}
}
Developed By
Ali Abdelrasheed — Graduation Project Nile University · B.Sc. Information Technology – Big Data · Class of 2024 🤗 HuggingFace Profile
- Downloads last month
- 1