---
base_model: [Qwen2.5VL]
library_name: transformers
tags:
- mergekit
- merge
---
# Baseer-Nakba HTR: A State-of-the-Art VLM for Arabic Handwritten Text Recognition

## Overview

This repository contains the model weights and inference pipeline for our submission to the NAKBA NLP 2026 Arabic Handwritten Text Recognition (HTR) competition.

Our approach adapts the 3B-parameter [Baseer](https://arxiv.org/abs/2509.18174) Vision-Language Model (VLM) to effectively parse and recognize highly cursive, historical Arabic manuscripts. Through a progressive training pipeline, domain-matched data augmentation, and advanced checkpoint merging, this unified model mitigates the challenges of varying writer styles, age-related document degradation, and morphological complexity.

To try our Baseer model for document extraction, please visit: [Baseer](https://baseerocr.com/) — **Baseer** is the SOTA model on Arabic Document Extraction.

---

## 🏆 Competition Results

Our final model (**Misraj AI**) secured **1st place** on the official Nakba hidden test set [leaderboard](https://www.codabench.org/competitions/12591/).

| Rank | Team | CER | WER |
| :--- | :--- | :--- | :--- |
| 🥇 1st | **Misraj AI** | **0.0790** | **0.2440** |
| 🥈 2nd | Oblevit | 0.0925 | 0.3268 |
| 🥉 3rd | 3reeq | 0.0938 | 0.2996 |
| 4th | Latent Narratives | 0.1050 | 0.3106 |
| 5th | Al-Warraq | 0.1142 | 0.3780 |
| 6th | Not Gemma | 0.1217 | 0.3063 |
| 7th | NAMAA-Qari | 0.1950 | 0.5194 |
| 8th | Fahras | 0.2269 | 0.5223 |
| — | Baseline | 0.3683 | 0.6905 |

---

## Training Methodology

Our model was trained using a multi-stage Supervised Fine-Tuning (SFT) curriculum.

1. **Data Augmentation**: The Muharaf enhancement dataset was converted to grayscale to match the visual complexity and tonal distribution of the Nakba competition data.
2. **Decoder-Only SFT**: We first trained the text decoder autoregressively on the structurally similar Muharaf dataset to condition the language modeling head.
3. **Full Encoder-Decoder Tuning**: We subsequently unfroze the vision encoder and trained the full architecture on the Nakba dataset using differential learning rates — a key step that yielded a >5% improvement in WER over decoder-only tuning.
4. **Checkpoint Merging**: To stabilize predictions and maximize generalization, we merged our top-performing checkpoints (Epoch 1 and Epoch 5) using SLERP interpolation.

---

## Training Hyperparameters

All supervised experiments were conducted with standardized hyperparameters across configurations.

| Parameter | Value |
| :--- | :--- |
| **Hardware** | 2× NVIDIA H100 GPUs |
| **Base Model** | 3B-parameter Baseer |
| **Epochs** | 5 |
| **Optimizer** | AdamW |
| **Weight Decay** | 0.01 |
| **Learning Rate Schedule** | Cosine |
| **Batch Size** | 128 |
| **Max Sequence Length** | 1200 tokens |
| **Input Image Resolution** | 644 × 644 pixels |
| **Decoder-Only Learning Rate** | 1e-4 |
| **Encoder Learning Rate** | 9e-6 |
| **Decoder Learning Rate (Full Tuning)** | 1e-4 |

---

## Image Examples

The model works reliably on images from the Nakba dataset and visually similar historical manuscripts.

![image (1)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/MtU8b_IZ1_kbiwg3BISDg.jpeg)
![image (2)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/bmzC1F1rJz52ljDo0LbOY.jpeg)
![image (3)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/LNvoN4NkaVJ8zgUqzG8bm.jpeg)

---

## Merge Method

This model was merged using the [SLERP](https://en.wikipedia.org/wiki/Slerp) merge method.

### Models Merged

- `Baseer_Nakba_ep_1`
- `Baseer_Nakba_ep_5`

### Configuration

```yaml
merge_method: slerp
base_model: Baseer_Nakba_ep_1
models:
  - model: Baseer_Nakba_ep_1
  - model: Baseer_Nakba_ep_5
parameters:
  t:
    - value: 0.50
dtype: bfloat16
```

---

## Citation

If you use this model or find our work helpful, please consider citing our paper:

```bibtex
@inproceedings{misrajai2026nakba,
  title     = {Adapting Vision-Language Models for Historical Arabic Handwritten Text Recognition},
  author    = {Misraj AI},
  booktitle = {Nakba OCR Competition, NLP 2026},
  year      = {2026}
}
```

---

## Links

- 🤗 Model weights: [Misraj/Baseer__Nakba](https://huggingface.co/Misraj/Baseer__Nakba)
- 💻 Inference pipeline: [misraj-ai/Nakba-pipeline](https://github.com/misraj-ai/Nakba-pipeline)
- 🌐 Live demo: [baseerocr.com](https://baseerocr.com/)
- 📄 Competition: [Nakba Codabench](https://www.codabench.org/competitions/12591/)