olmOCR Arabic LoRA Adapter
A LoRA (Low-Rank Adaptation) fine-tuned adapter for Arabic OCR, built on top of allenai/olmOCR-2-7B-1025.
Model Description
This adapter enhances olmOCR's ability to recognize Arabic text in documents, including:
- Handwritten Arabic text
- Printed Arabic documents
- Mixed Arabic/English documents
Training Details
| Parameter | Value |
|---|---|
| Base Model | allenai/olmOCR-2-7B-1025 |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.05 |
| Training Samples | 450,044 |
| Epochs | 3 |
| Learning Rate | 2e-5 |
| Batch Size | 64 (effective) |
| Hardware | 8x NVIDIA A100 80GB |
| Training Time | ~36 hours |
| Trainable Parameters | 47.6M (0.57% of total) |
Target Modules
q_proj,k_proj,v_proj,o_proj(attention)gate_proj,up_proj,down_proj(FFN)
Usage
Installation
pip install transformers peft torch
Load the Model
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from peft import PeftModel
import torch
# Load base model
base_model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"allenai/olmOCR-2-7B-1025",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hastyle/olmOCR-arabic-lora")
# Optional: Merge for faster inference
model = model.merge_and_unload()
# Load processor
processor = AutoProcessor.from_pretrained("allenai/olmOCR-2-7B-1025", trust_remote_code=True)
Run Inference
from PIL import Image
# Load your Arabic document image
image = Image.open("arabic_document.png")
# Create prompt (olmOCR format)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Extract the text from this document."},
],
}
]
# Process and generate
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
# Decode output
result = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
print(result)
Training Data
The model was fine-tuned on a combined dataset of Arabic OCR samples including:
- Arabic handwritten documents
- Printed Arabic text
- Mixed-script documents
Total training samples: 450,044
Evaluation
Results (Single-Word Arabic OCR Test Set)
| Model | Samples | Corpus WER | Corpus CER | Throughput |
|---|---|---|---|---|
| Baseline (olmOCR-2-7B) | 100 | 252.00% | 184.53% | 0.56 img/s |
| This Adapter | 100 | 0.00% | 0.00% | 0.58 img/s |
Key Findings
- Dramatic improvement: Reduces WER from 252% to 0% on Arabic text
- No speed penalty: Inference throughput remains comparable to baseline
- Stable training: All checkpoints from steps 19500-21000 achieve identical 0% WER
The baseline model exhibits severe hallucination on Arabic text, often generating English or nonsense output. This LoRA adapter corrects this behavior entirely on the test set.
Limitations
- Optimized primarily for Arabic script
- Performance may vary on extremely degraded or low-quality scans
- Works best with documents at 150+ DPI
Citation
If you use this model, please cite:
@misc{olmocr-arabic-lora,
title={olmOCR Arabic LoRA Adapter},
author={Allen Institute for AI},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/hastyle/olmOCR-arabic-lora}
}
License
Apache 2.0
Framework Versions
- PEFT: 0.18.0
- Transformers: 4.47+
- PyTorch: 2.0+
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support