File size: 1,515 Bytes
b85ccb6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
base_model: Qwen/Qwen3-VL-8B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen3-VL-8B-Instruct
- lora
- sft
- transformers
- trl
---
# Model Card for ar-ms-baseline
## Model Summary
This model is the baseline system for the NAKBA NLP 2026: Arabic Manuscript Understanding Shared Task (Systems Track). It fine-tunes Qwen3-VL-8B-Instruct with LoRA to transcribe Arabic manuscript line images into text.
## Model Details
### Description
- **Model type:** Vision-language OCR/HTR model (LoRA-adapted)
- **Finetuned from model:** Qwen/Qwen3-VL-8B-Instruct
### Sources
- **Repository:** https://github.com/U4RASD/ar-ms-baseline
- **Shared Task:** https://acrps.ai/nakba-nlp-manu-understanding-2026
## Training Details
### Training Data
- NAKBA NLP 2026 Shared Task (Subtask 2) training split from the Omar Al-Saleh memoir collection.
- Dataset includes line images with gold transcriptions.
### Training Procedure
- Supervised fine-tuning with LoRA adapters on Qwen/Qwen3-VL-8B-Instruct.
#### Training Hyperparameters
- **Config reference:** Hyperparameters are listed in `configs/default.json`
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
- NAKBA NLP 2026 Shared Task (Subtask 2) released test set of line images.
#### Metrics
- **CER (Character Error Rate)**
- **WER (Word Error Rate)**
### Results
On released test set:
- CER: 0.2297
- WER: 0.4998
- **Hardware:** NVIDIA H100 SXM
## Contact
- ar-ms@dohainstitute.edu.qa
|