mdnaseif/hafith-combined-benchmark
Viewer β’ Updated β’ 44.3k β’ 100
How to use mdnaseif/hafith with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "image-to-text" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("image-to-text", model="mdnaseif/hafith") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("mdnaseif/hafith", dtype="auto")# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("mdnaseif/hafith", dtype="auto")State-of-the-art OCR model for historical Arabic manuscripts achieving 5.10% CER through native-resolution encoding, Arabic-native tokenization, and synthetic pretraining.
| Dataset | CER | WER | Relative Improvement |
|---|---|---|---|
| MUHARAF | 8.35% | 24.76% | -71% vs TrOCR |
| KHATT | 11.21% | 37.36% | -37% vs TrOCR |
| RASAM | 4.95% | 18.94% | -86% vs TrOCR |
| Combined | 5.10% | 18.05% | -57% vs TrOCR |
State-of-the-Art: 36% relative improvement over previous best (HATFormer, 8% CER)
pip install transformers pillow torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image
# Load model and tokenizer
model = AutoModel.from_pretrained("mdnaseif/hafith")
tokenizer = AutoTokenizer.from_pretrained("mdnaseif/hafith")
# Load manuscript image
image = Image.open("manuscript_line.jpg")
# Run OCR
with torch.no_grad():
outputs = model.generate(image, max_length=64)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Recognized text: {text}")
from datasets import load_dataset
# Load your manuscript dataset
dataset = load_dataset("your_dataset")
# Process in batches
batch_size = 32
for i in range(0, len(dataset), batch_size):
batch = dataset[i:i+batch_size]
images = [img.convert('RGB') for img in batch['image']]
outputs = model.generate(images, max_length=64)
texts = tokenizer.batch_decode(outputs, skip_special_tokens=True)
for img_id, text in zip(batch['id'], texts):
print(f"{img_id}: {text}")
Input Image (HΓWΓ3)
β
SigLIP V2 NaFlex Encoder
- 400M parameters
- Up to 512 patches (aspect-ratio preserving)
- Output: 512Γ1152 embeddings
β
Projection Layer (1152 β 1024)
β
RoBERTa-Large Decoder
- 24 layers, 16 attention heads
- Trained from scratch with Aranizer
- Cross-attention to visual features
β
Aranizer Tokenizer (64K vocab)
β
Arabic Text Output
| Model | Encoder | Tokenizer | CER | WER |
|---|---|---|---|---|
| CRNN+CTC | CNN | Character-level | 14.82% | - |
| TrOCR-Base | BEiT-B (384Γ384) | RoBERTa | 13.41% | - |
| TrOCR-Large | BEiT-L (384Γ384) | RoBERTa | 11.73% | 31.82% |
| HATFormer | BEiT-L (384Γ384) | RoBERTa | 8.60% | - |
| HAFITH (Ours) | SigLIP2 NaFlex | Aranizer | 5.10% | 18.05% |
@article{naseif2026hafith,
title={HAFITH: Aspect-Ratio Preserving Vision-Language Model for Historical Arabic Manuscript Recognition},
author={Naseif, Mohammed and Mesabah, Islam and Hajjaj, Dalia and Hassan, Abdulrahman and Elhayek, Ahmed and Koubaa, Anis},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026}
}
Apache 2.0
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="mdnaseif/hafith")