Baseer__Nakba / README.md
muhammad0-0hreden's picture
Update README.md
de11c1d verified
---
base_model: [Qwen2.5VL]
library_name: transformers
tags:
- mergekit
- merge
---
# Baseer-Nakba HTR: A State-of-the-Art VLM for Arabic Handwritten Text Recognition
## Overview
This repository contains the model weights and inference pipeline for our submission to the NAKBA NLP 2026 Arabic Handwritten Text Recognition (HTR) competition.
Our approach adapts the 3B-parameter [Baseer](https://arxiv.org/abs/2509.18174) Vision-Language Model (VLM) to effectively parse and recognize highly cursive, historical Arabic manuscripts. Through a progressive training pipeline, domain-matched data augmentation, and advanced checkpoint merging, this unified model mitigates the challenges of varying writer styles, age-related document degradation, and morphological complexity.
To try our Baseer model for document extraction, please visit: [Baseer](https://baseerocr.com/) β€” **Baseer** is the SOTA model on Arabic Document Extraction.
---
## πŸ† Competition Results
Our final model (**Misraj AI**) secured **1st place** on the official Nakba hidden test set [leaderboard](https://www.codabench.org/competitions/12591/).
| Rank | Team | CER | WER |
| :--- | :--- | :--- | :--- |
| πŸ₯‡ 1st | **Misraj AI** | **0.0790** | **0.2440** |
| πŸ₯ˆ 2nd | Oblevit | 0.0925 | 0.3268 |
| πŸ₯‰ 3rd | 3reeq | 0.0938 | 0.2996 |
| 4th | Latent Narratives | 0.1050 | 0.3106 |
| 5th | Al-Warraq | 0.1142 | 0.3780 |
| 6th | Not Gemma | 0.1217 | 0.3063 |
| 7th | NAMAA-Qari | 0.1950 | 0.5194 |
| 8th | Fahras | 0.2269 | 0.5223 |
| β€” | Baseline | 0.3683 | 0.6905 |
---
## Training Methodology
Our model was trained using a multi-stage Supervised Fine-Tuning (SFT) curriculum.
1. **Data Augmentation**: The Muharaf enhancement dataset was converted to grayscale to match the visual complexity and tonal distribution of the Nakba competition data.
2. **Decoder-Only SFT**: We first trained the text decoder autoregressively on the structurally similar Muharaf dataset to condition the language modeling head.
3. **Full Encoder-Decoder Tuning**: We subsequently unfroze the vision encoder and trained the full architecture on the Nakba dataset using differential learning rates β€” a key step that yielded a >5% improvement in WER over decoder-only tuning.
4. **Checkpoint Merging**: To stabilize predictions and maximize generalization, we merged our top-performing checkpoints (Epoch 1 and Epoch 5) using SLERP interpolation.
---
## Training Hyperparameters
All supervised experiments were conducted with standardized hyperparameters across configurations.
| Parameter | Value |
| :--- | :--- |
| **Hardware** | 2Γ— NVIDIA H100 GPUs |
| **Base Model** | 3B-parameter Baseer |
| **Epochs** | 5 |
| **Optimizer** | AdamW |
| **Weight Decay** | 0.01 |
| **Learning Rate Schedule** | Cosine |
| **Batch Size** | 128 |
| **Max Sequence Length** | 1200 tokens |
| **Input Image Resolution** | 644 Γ— 644 pixels |
| **Decoder-Only Learning Rate** | 1e-4 |
| **Encoder Learning Rate** | 9e-6 |
| **Decoder Learning Rate (Full Tuning)** | 1e-4 |
---
## Image Examples
The model works reliably on images from the Nakba dataset and visually similar historical manuscripts.
![image (1)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/MtU8b_IZ1_kbiwg3BISDg.jpeg)
![image (2)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/bmzC1F1rJz52ljDo0LbOY.jpeg)
![image (3)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/LNvoN4NkaVJ8zgUqzG8bm.jpeg)
---
## Merge Method
This model was merged using the [SLERP](https://en.wikipedia.org/wiki/Slerp) merge method.
### Models Merged
- `Baseer_Nakba_ep_1`
- `Baseer_Nakba_ep_5`
### Configuration
```yaml
merge_method: slerp
base_model: Baseer_Nakba_ep_1
models:
- model: Baseer_Nakba_ep_1
- model: Baseer_Nakba_ep_5
parameters:
t:
- value: 0.50
dtype: bfloat16
```
---
## Citation
If you use this model or find our work helpful, please consider citing our paper:
```bibtex
@inproceedings{misrajai2026nakba,
title = {Adapting Vision-Language Models for Historical Arabic Handwritten Text Recognition},
author = {Misraj AI},
booktitle = {Nakba OCR Competition, NLP 2026},
year = {2026}
}
```
---
## Links
- πŸ€— Model weights: [Misraj/Baseer__Nakba](https://huggingface.co/Misraj/Baseer__Nakba)
- πŸ’» Inference pipeline: [misraj-ai/Nakba-pipeline](https://github.com/misraj-ai/Nakba-pipeline)
- 🌐 Live demo: [baseerocr.com](https://baseerocr.com/)
- πŸ“„ Competition: [Nakba Codabench](https://www.codabench.org/competitions/12591/)