Instructions to use Misraj/Baseer__Nakba with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Misraj/Baseer__Nakba with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Misraj/Baseer__Nakba", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Misraj/Baseer__Nakba")
model = AutoModelForMultimodalLM.from_pretrained("Misraj/Baseer__Nakba", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Misraj/Baseer__Nakba with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Misraj/Baseer__Nakba"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Misraj/Baseer__Nakba",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Misraj/Baseer__Nakba

SGLang

How to use Misraj/Baseer__Nakba with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Misraj/Baseer__Nakba" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Misraj/Baseer__Nakba",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Misraj/Baseer__Nakba" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Misraj/Baseer__Nakba",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Misraj/Baseer__Nakba with Docker Model Runner:
```
docker model run hf.co/Misraj/Baseer__Nakba
```

muhammad0-0hreden commited on Mar 5

Commit

de11c1d

verified ·

1 Parent(s): 7817d84

Update README.md

Browse files

Files changed (1) hide show

README.md +77 -36

README.md CHANGED Viewed

@@ -5,56 +5,77 @@ tags:
 - mergekit
 - merge
 ---
 # Baseer-Nakba HTR: A State-of-the-Art VLM for Arabic Handwritten Text Recognition
 ## Overview
 This repository contains the model weights and inference pipeline for our submission to the NAKBA NLP 2026 Arabic Handwritten Text Recognition (HTR) competition.
-Our approach adapts the 3B-parameter [Baseer](https://arxiv.org/abs/2509.18174) Vision-Language Model (VLM) to effectively parse and recognize highly cursive, historical Arabic manuscripts.
-training pipeline, domain-matched data augmentation, and advanced checkpoint merging, this unified model mitigates the challenges of varying writer styles, age-related document degradation, and morphological complexity.
-To try our Basser model for document extraction, please visit: [Basser](https://baseerocr.com/) **Basser** is the SOTA model on Arabic Document Extraction.
-## Competition Results
-Our final model secured top placements on the official Nakba hidden test set [leaderboard](https://www.codabench.org/competitions/12591/).
-| Metric | Score | Rank |
-| :--- | :--- | :--- |
-| Word Error Rate (WER) | 0.25  | 1st  |
-| Character Error Rate (CER) | 0.09  | 2nd  |
 ## Training Methodology
 Our model was trained using a multi-stage Supervised Fine-Tuning (SFT) curriculum.
-  **Data Augmentation**: The Muharaf enhancement dataset was converted to grayscale to match the visual complexity and tonal distribution of the Nakba competition data.
-  **Decoder-Only SFT**: We first trained the text decoder autoregressively on the structurally similar Muharaf dataset to condition the language modeling head.
-  **Full Encoder-Decoder Tuning**: We subsequently unfroze the vision encoder and trained the full architecture on the Nakba dataset.
-  **Checkpoint Merging**: To stabilize predictions and maximize generalization, we averaged the weights of our top-performing epochs (Epoch 1 and Epoch 5).
 ## Training Hyperparameters
-All supervised experiments were conducted ensuring standardized hyperparameters across configurations.
 | Parameter | Value |
 | :--- | :--- |
-| **Hardware** | 2 NVIDIA H100 GPUs  |
-| **Base Model** | [3B-parameter Baseer  |
-| **Epochs** | 5  |
 | **Optimizer** | AdamW |
-| **Weight Decay** | 0.01  |
 | **Learning Rate Schedule** | Cosine |
-| **Batch Size** | 128  |
-| **Max Sequence Length** | 1200 tokens  |
-| **Input Image Resolution** | 644 x 644 pixels |
-| **Decoder-Only Learning Rate** | 1e-4  |
-| **Encoder-Decoder Learning Rate** | Text Decoder: 1e-4, Vision Encoder: 9e-6 |
-## Image Example
-The model work perfectly for images from Nakba datasets or similar ones.
-![image (1)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/MtU8b_IZ1_kbiwg3BISDg.jpeg)
 ![image (2)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/bmzC1F1rJz52ljDo0LbOY.jpeg)
-![image](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/LNvoN4NkaVJ8zgUqzG8bm.jpeg)
 ## Merge Method
@@ -62,23 +83,43 @@ This model was merged using the [SLERP](https://en.wikipedia.org/wiki/Slerp) mer
 ### Models Merged
-The following models were included in the merge:
-* Basser_Nakab_ep_5
-* Basser_Nakab_ep_1
 ### Configuration
-The following YAML configuration was used to produce this model:
 ```yaml
 merge_method: slerp
-base_model: Basser_Nakab_ep_1
 models:
-  - model: Basser_Nakab_ep_1
-  - model: Basser_Nakab_ep_5
 parameters:
   t:
     - value: 0.50
 dtype: bfloat16
 ```

 - mergekit
 - merge
 ---
 # Baseer-Nakba HTR: A State-of-the-Art VLM for Arabic Handwritten Text Recognition
 ## Overview
 This repository contains the model weights and inference pipeline for our submission to the NAKBA NLP 2026 Arabic Handwritten Text Recognition (HTR) competition.
+Our approach adapts the 3B-parameter [Baseer](https://arxiv.org/abs/2509.18174) Vision-Language Model (VLM) to effectively parse and recognize highly cursive, historical Arabic manuscripts. Through a progressive training pipeline, domain-matched data augmentation, and advanced checkpoint merging, this unified model mitigates the challenges of varying writer styles, age-related document degradation, and morphological complexity.
+To try our Baseer model for document extraction, please visit: [Baseer](https://baseerocr.com/) — **Baseer** is the SOTA model on Arabic Document Extraction.
+---
+## 🏆 Competition Results
+Our final model (**Misraj AI**) secured **1st place** on the official Nakba hidden test set [leaderboard](https://www.codabench.org/competitions/12591/).
+| Rank | Team | CER | WER |
+| :--- | :--- | :--- | :--- |
+| 🥇 1st | **Misraj AI** | **0.0790** | **0.2440** |
+| 🥈 2nd | Oblevit | 0.0925 | 0.3268 |
+| 🥉 3rd | 3reeq | 0.0938 | 0.2996 |
+| 4th | Latent Narratives | 0.1050 | 0.3106 |
+| 5th | Al-Warraq | 0.1142 | 0.3780 |
+| 6th | Not Gemma | 0.1217 | 0.3063 |
+| 7th | NAMAA-Qari | 0.1950 | 0.5194 |
+| 8th | Fahras | 0.2269 | 0.5223 |
+| — | Baseline | 0.3683 | 0.6905 |
+---
 ## Training Methodology
 Our model was trained using a multi-stage Supervised Fine-Tuning (SFT) curriculum.
+1. **Data Augmentation**: The Muharaf enhancement dataset was converted to grayscale to match the visual complexity and tonal distribution of the Nakba competition data.
+2. **Decoder-Only SFT**: We first trained the text decoder autoregressively on the structurally similar Muharaf dataset to condition the language modeling head.
+3. **Full Encoder-Decoder Tuning**: We subsequently unfroze the vision encoder and trained the full architecture on the Nakba dataset using differential learning rates — a key step that yielded a >5% improvement in WER over decoder-only tuning.
+4. **Checkpoint Merging**: To stabilize predictions and maximize generalization, we merged our top-performing checkpoints (Epoch 1 and Epoch 5) using SLERP interpolation.
+---
 ## Training Hyperparameters
+All supervised experiments were conducted with standardized hyperparameters across configurations.
 | Parameter | Value |
 | :--- | :--- |
+| **Hardware** | 2× NVIDIA H100 GPUs |
+| **Base Model** | 3B-parameter Baseer |
+| **Epochs** | 5 |
 | **Optimizer** | AdamW |
+| **Weight Decay** | 0.01 |
 | **Learning Rate Schedule** | Cosine |
+| **Batch Size** | 128 |
+| **Max Sequence Length** | 1200 tokens |
+| **Input Image Resolution** | 644 × 644 pixels |
+| **Decoder-Only Learning Rate** | 1e-4 |
+| **Encoder Learning Rate** | 9e-6 |
+| **Decoder Learning Rate (Full Tuning)** | 1e-4 |
+---
+## Image Examples
+The model works reliably on images from the Nakba dataset and visually similar historical manuscripts.
+![image (1)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/MtU8b_IZ1_kbiwg3BISDg.jpeg)
 ![image (2)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/bmzC1F1rJz52ljDo0LbOY.jpeg)
+![image (3)](https://cdn-uploads.huggingface.co/production/uploads/65276c7911a8a521c91bc10f/LNvoN4NkaVJ8zgUqzG8bm.jpeg)
+---
 ## Merge Method
 ### Models Merged
+- `Baseer_Nakba_ep_1`
+- `Baseer_Nakba_ep_5`
 ### Configuration
 ```yaml
 merge_method: slerp
+base_model: Baseer_Nakba_ep_1
 models:
+  - model: Baseer_Nakba_ep_1
+  - model: Baseer_Nakba_ep_5
 parameters:
   t:
     - value: 0.50
 dtype: bfloat16
+```
+---
+## Citation
+If you use this model or find our work helpful, please consider citing our paper:
+```bibtex
+@inproceedings{misrajai2026nakba,
+  title     = {Adapting Vision-Language Models for Historical Arabic Handwritten Text Recognition},
+  author    = {Misraj AI},
+  booktitle = {Nakba OCR Competition, NLP 2026},
+  year      = {2026}
+}
 ```
+---
+## Links
+- 🤗 Model weights: [Misraj/Baseer__Nakba](https://huggingface.co/Misraj/Baseer__Nakba)
+- 💻 Inference pipeline: [misraj-ai/Nakba-pipeline](https://github.com/misraj-ai/Nakba-pipeline)
+- 🌐 Live demo: [baseerocr.com](https://baseerocr.com/)
+- 📄 Competition: [Nakba Codabench](https://www.codabench.org/competitions/12591/)