metadata
language:
- ar
license: cc-by-nc-4.0
tags:
- handwritten-text-recognition
- arabic
- khatt
- densenet
- transformer
- transfer-learning
- pytorch
- safetensors
datasets:
- KHATT
- DASTNUS
metrics:
- cer
- wer
pipeline_tag: image-to-text
Arabic Handwritten Text Recognition: DenseNet121-Transformer (Fine-tuned on KHATT)
Model Description
A lightweight DenseNet121-Transformer architecture for Arabic handwritten line recognition, pre-trained on the Kurdish DASTNUS dataset and fine-tuned on the KHATT Arabic handwritten dataset. Uses a triple unified vocabulary covering Kurdish, Arabic, and Urdu scripts (192 tokens). The KHATT dataset is publicly available at https://www.kaggle.com/datasets/iraqyomar/khatt-arabic-hand-written-lines/code (We only used Unique Handwritten Lines)
Architecture
- CNN Backbone: DenseNet-121 (pretrained on ImageNet)
- Encoder: 3 Transformer encoder layers
- Decoder: 3 Transformer decoder layers
- Attention Heads: 8
- Hidden Size: 256
- Parameters: ~12.8M
- Vocabulary: 192 tokens (Triple unified: Kurdish + Arabic + Urdu)
Transfer Learning Pipeline
- Pre-trained on Kurdish DASTNUS dataset (with unified vocabulary)
- Fine-tuned on KHATT Arabic handwritten line dataset
Performance on KHATT Test Set
| Metric | Value |
|---|---|
| CER | 0.1135 |
| WER | 0.4156 |
| CRR | 88.65% |
Training Data
- Pre-training: DASTNUS Kurdish handwritten dataset
- Fine-tuning: KHATT Arabic handwritten dataset (5,166 training, 574 validation)
Usage
from safetensors.torch import load_file
import json
# Load model weights
state_dict = load_file("model.safetensors")
# Load config
with open("config.json", "r") as f:
config = json.load(f)
# Load vocabulary
with open("vocab.json", "r", encoding="utf-8") as f:
vocab = json.load(f)
# Load full unified vocabulary info
with open("unified_vocabulary.json", "r", encoding="utf-8") as f:
unified_vocab = json.load(f)
Citation
[]
License
This model is released for non-commercial scientific research purposes only.