Instructions to use AhmedZaky1/DIMI-Arabic-OCR-V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AhmedZaky1/DIMI-Arabic-OCR-V2 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-vl-7b-instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "AhmedZaky1/DIMI-Arabic-OCR-V2")

Transformers

How to use AhmedZaky1/DIMI-Arabic-OCR-V2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="AhmedZaky1/DIMI-Arabic-OCR-V2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("AhmedZaky1/DIMI-Arabic-OCR-V2", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use AhmedZaky1/DIMI-Arabic-OCR-V2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AhmedZaky1/DIMI-Arabic-OCR-V2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AhmedZaky1/DIMI-Arabic-OCR-V2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/AhmedZaky1/DIMI-Arabic-OCR-V2

SGLang

How to use AhmedZaky1/DIMI-Arabic-OCR-V2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AhmedZaky1/DIMI-Arabic-OCR-V2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AhmedZaky1/DIMI-Arabic-OCR-V2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AhmedZaky1/DIMI-Arabic-OCR-V2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AhmedZaky1/DIMI-Arabic-OCR-V2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use AhmedZaky1/DIMI-Arabic-OCR-V2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AhmedZaky1/DIMI-Arabic-OCR-V2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AhmedZaky1/DIMI-Arabic-OCR-V2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AhmedZaky1/DIMI-Arabic-OCR-V2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="AhmedZaky1/DIMI-Arabic-OCR-V2",
    max_seq_length=2048,
)

Docker Model Runner
How to use AhmedZaky1/DIMI-Arabic-OCR-V2 with Docker Model Runner:
```
docker model run hf.co/AhmedZaky1/DIMI-Arabic-OCR-V2
```

DIMI Arabic OCR v2

Accurate Arabic OCR model V2 for extracting printed Arabic text from images

Model Description

DIMI Arabic OCR v2 is a specialized Arabic Optical Character Recognition model fine-tuned on Qwen2.5-VL-7B-Instruct using LoRA adapters. This is the second iteration, building upon v1 with improved diacritics handling and enhanced accuracy across diverse Arabic text scenarios.

Developed by: Ahmed Zaky
Base Model: AhmedZaky1/DIMI-Arabic-OCR (v1)
Original Base: Qwen/Qwen2.5-VL-7B-Instruct
Model Type: Vision-Language Model (VLM) for Arabic OCR
Language: Arabic (ar)
License: Apache 2.0
Fine-tuning Method: LoRA (Low-Rank Adaptation) with 4-bit quantization

Key Improvements Over v1

✅ 30% reduction in WER on diacritics-heavy text
✅ Enhanced training dataset with balanced diacritics representation
✅ Improved generalization across news articles and formal documents
✅ Better preservation of text formatting and structure

📊 Performance Metrics

Test Set Results (500 samples from 2,600)

Metric	Score	Description
WER	0.3049	Word Error Rate (↓ lower is better)
CER	0.1119	Character Error Rate (↓ lower is better)
Perfect Predictions	23%	Exact matches with ground truth

Validation Set Results (100 samples)

Metric	Score
WER	0.2315
CER	0.0776

Comparison with v1

Model	Test WER	Test CER	Val WER	Val CER
v1	0.404	0.226	0.3308	0.1820
v2	0.3049 ↓	0.1119 ↓	0.2315	0.0776

Improvements:

WER reduced by ~24.5% (0.404 → 0.3049)
CER reduced by ~50.5% (0.226 → 0.1119)

🎯 Intended Use

Direct Use

This model is designed for extracting Arabic text from images, including:

📰 News articles and printed documents
📝 Formal Arabic text with diacritics (تشكيل)
🔢 Mixed Arabic text and numbers
📄 Scanned documents and screenshots

Example Use Case

from unsloth import FastVisionModel
from PIL import Image
import torch

# Load model
model, tokenizer = FastVisionModel.from_pretrained(
    "AhmedZaky1/DIMI-Arabic-OCR-v2",
    load_in_4bit=True,
    device_map="auto"
)
FastVisionModel.for_inference(model)

# Load image
image = Image.open("arabic_document.jpg")

# Prepare prompt
instruction = "استخرج النص العربي والأرقام الموجودة في هذه الصورة بدقة عالية."

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": instruction},
        ],
    }
]

# Apply chat template
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

# Tokenize
inputs = tokenizer(
    text=[text],
    images=[image],
    padding=True,
    return_tensors="pt",
    truncation=False
).to("cuda")

# Generate
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        do_sample=False
    )

# Decode
generated_ids = [
    out[len(inp):] for inp, out in zip(inputs.input_ids, outputs)
]
prediction = tokenizer.batch_decode(
    generated_ids, 
    skip_special_tokens=True
)[0]

print(prediction)

🧾 Training Data

Fine-tuned on 11,000 Arabic text images combining:

The dataset covers modern standard Arabic with and without diacritics.

📚 Citation

If you use this model, please cite:

@misc{dimi-arabic-ocr-2025,
  author = {Ahmed Zaky},
  title = {DIMI-Arabic-OCR: Fine-tuned Qwen2.5-VL for Arabic Text Recognition},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AhmedZaky1/DIMI-Arabic-OCR}}
}

🔗 Related Projects

DIMI Models Series — Arabic Vision & Language Models

Built with ❤️ by Ahmed Zaky

Advancing Arabic NLP through state-of-the-art embedding models

Downloads last month: 35

Model tree for AhmedZaky1/DIMI-Arabic-OCR-V2

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Quantized

unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit

Adapter

AhmedZaky1/DIMI-Arabic-OCR