Arabic-OCR-DeepSeek-OCR-2-3B

This repository contains the bfloat16 merged version of the DeepSeek-OCR-2 (3B) Model, fine-tuned by loay for the specific task of performing high-precision Optical Character Recognition (OCR) and structural layout analysis on Arabic text from images.

The model was created by fine-tuning the unsloth/DeepSeek-OCR-2 model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.

Model Details

Fine-tuned by: loay
Base Model: unsloth/DeepSeek-OCR-2
Fine-tuning Task: Arabic Optical Character Recognition, specialized for the "Free OCR" instruction.
Training Data: The model was trained on a high-quality dataset of 35,921 images of real Arabic scanned books and their corresponding structured transcriptions, generated using Gemini 2 Flash.
Hardware & Performance: Training was conducted on an NVIDIA H100 (80GB VRAM) leveraging native bfloat16 precision and Flash Attention 2 for optimal quality and speed.
LoRA Configuration: High-capacity adaptation was achieved using a rank of r=64 and lora_alpha=128. The targeted modules include ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"].
Vision Strategy: The model utilizes high-resolution dynamic multi-patch cropping (crop_mode=True) to handle varying large page sizes correctly without downscaling text artifacts.
Output Format: This is a native bfloat16 precision model.