NAMAA-Space/QariOCR-v0.3-markdown-mixed-dataset
Viewer β’ Updated β’ 37k β’ 67 β’ 11
How to use NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct", dtype="auto")How to use NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct
How to use NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct with Docker Model Runner:
docker model run hf.co/NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct
| Metric | Score |
|---|---|
| Character Error Rate (CER) | 0.300 |
| Word Error Rate (WER) | 0.485 |
| BLEU Score | 0.545 |
| Training Time | 11 hours |
| COβ Emissions | 1.88 kg eq. |
While QARI v0.2 achieves better raw text accuracy (CER: 0.061), QARI v0.3 excels in:
You can load this model using the transformers and qwen_vl_utils library:
!pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
!pip install -U bitsandbytes
from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
import os
from qwen_vl_utils import process_vision_info
model_name = "NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_name)
max_tokens = 2000
prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
image.save("image.png")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": f"file://{src}"},
{"type": "text", "text": prompt},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
os.remove(src)
print(output_text)
Try the model on Google Colab, Notebook
BibTeX:
@article{wasfy2025qari,
title={QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation},
author={Wasfy, Ahmed and Nacar, Omer and Elkhateb, Abdelakreem and Reda, Mahmoud and Elshehy, Omar and Ammar, Adel and Boulila, Wadii},
journal={arXiv preprint arXiv:2506.02295},
year={2025}
}