--- base_model: [Qwen2.5VL] library_name: transformers tags: - mergekit - merge --- # Baseer-Nakba HTR: A State-of-the-Art VLM for Arabic Handwritten Text Recognition ## Overview This repository contains the model weights and inference pipeline for our submission to the NAKBA NLP 2026 Arabic Handwritten Text Recognition (HTR) competition. Our approach adapts the 3B-parameter [Baseer](https://arxiv.org/abs/2509.18174) Vision-Language Model (VLM) to effectively parse and recognize highly cursive, historical Arabic manuscripts. Through a progressive training pipeline, domain-matched data augmentation, and advanced checkpoint merging, this unified model mitigates the challenges of varying writer styles, age-related document degradation, and morphological complexity. To try our Baseer model for document extraction, please visit: [Baseer](https://baseerocr.com/) — **Baseer** is the SOTA model on Arabic Document Extraction. --- ## 🏆 Competition Results Our final model (**Misraj AI**) secured **1st place** on the official Nakba hidden test set [leaderboard](https://www.codabench.org/competitions/12591/). | Rank | Team | CER | WER | | :--- | :--- | :--- | :--- | | 🥇 1st | **Misraj AI** | **0.0790** | **0.2440** | | 🥈 2nd | Oblevit | 0.0925 | 0.3268 | | 🥉 3rd | 3reeq | 0.0938 | 0.2996 | | 4th | Latent Narratives | 0.1050 | 0.3106 | | 5th | Al-Warraq | 0.1142 | 0.3780 | | 6th | Not Gemma | 0.1217 | 0.3063 | | 7th | NAMAA-Qari | 0.1950 | 0.5194 | | 8th | Fahras | 0.2269 | 0.5223 | | — | Baseline | 0.3683 | 0.6905 | --- ## Training Methodology Our model was trained using a multi-stage Supervised Fine-Tuning (SFT) curriculum. 1. **Data Augmentation**: The Muharaf enhancement dataset was converted to grayscale to match the visual complexity and tonal distribution of the Nakba competition data. 2. **Decoder-Only SFT**: We first trained the text decoder autoregressively on the structurally similar Muharaf dataset to condition the language modeling head. 3. **Full Encoder-Decoder Tuning**: We subsequently unfroze the vision encoder and trained the full architecture on the Nakba dataset using differential learning rates — a key step that yielded a >5% improvement in WER over decoder-only tuning. 4. **Checkpoint Merging**: To stabilize predictions and maximize generalization, we merged our top-performing checkpoints (Epoch 1 and Epoch 5) using SLERP interpolation. --- ## Training Hyperparameters All supervised experiments were conducted with standardized hyperparameters across configurations. | Parameter | Value | | :--- | :--- | | **Hardware** | 2× NVIDIA H100 GPUs | | **Base Model** | 3B-parameter Baseer | | **Epochs** | 5 | | **Optimizer** | AdamW | | **Weight Decay** | 0.01 | | **Learning Rate Schedule** | Cosine | | **Batch Size** | 128 | | **Max Sequence Length** | 1200 tokens | | **Input Image Resolution** | 644 × 644 pixels | | **Decoder-Only Learning Rate** | 1e-4 | | **Encoder Learning Rate** | 9e-6 | | **Decoder Learning Rate (Full Tuning)** | 1e-4 | --- ## Image Examples The model works reliably on images from the Nakba dataset and visually similar historical manuscripts. ![image (1)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/MtU8b_IZ1_kbiwg3BISDg.jpeg) ![image (2)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/bmzC1F1rJz52ljDo0LbOY.jpeg) ![image (3)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/LNvoN4NkaVJ8zgUqzG8bm.jpeg) --- ## Merge Method This model was merged using the [SLERP](https://en.wikipedia.org/wiki/Slerp) merge method. ### Models Merged - `Baseer_Nakba_ep_1` - `Baseer_Nakba_ep_5` ### Configuration ```yaml merge_method: slerp base_model: Baseer_Nakba_ep_1 models: - model: Baseer_Nakba_ep_1 - model: Baseer_Nakba_ep_5 parameters: t: - value: 0.50 dtype: bfloat16 ``` --- ## Citation If you use this model or find our work helpful, please consider citing our paper: ```bibtex @inproceedings{misrajai2026nakba, title = {Adapting Vision-Language Models for Historical Arabic Handwritten Text Recognition}, author = {Misraj AI}, booktitle = {Nakba OCR Competition, NLP 2026}, year = {2026} } ``` --- ## Links - 🤗 Model weights: [Misraj/Baseer__Nakba](https://huggingface.co/Misraj/Baseer__Nakba) - 💻 Inference pipeline: [misraj-ai/Nakba-pipeline](https://github.com/misraj-ai/Nakba-pipeline) - 🌐 Live demo: [baseerocr.com](https://baseerocr.com/) - 📄 Competition: [Nakba Codabench](https://www.codabench.org/competitions/12591/)