Arabic-OCR-DeepSeek-OCR-2-3B

This repository contains the bfloat16 merged version of the DeepSeek-OCR-2 (3B) Model, fine-tuned by loay for the specific task of performing high-precision Optical Character Recognition (OCR) and structural layout analysis on Arabic text from images.

The model was created by fine-tuning the unsloth/DeepSeek-OCR-2 model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.

Model Details

  • Fine-tuned by: loay
  • Base Model: unsloth/DeepSeek-OCR-2
  • Fine-tuning Task: Arabic Optical Character Recognition, specialized for the "Free OCR" instruction.
  • Training Data: The model was trained on a high-quality dataset of 35,921 images of real Arabic scanned books and their corresponding structured transcriptions.
  • Hardware & Performance: Training was conducted on an NVIDIA H100 (80GB VRAM) leveraging native bfloat16 precision and Flash Attention 2 for optimal quality and speed.
  • LoRA Configuration: High-capacity adaptation was achieved using a rank of r=64 and lora_alpha=128. The targeted modules include ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"].
  • Vision Strategy: The model utilizes high-resolution dynamic multi-patch cropping (crop_mode=True) to handle varying large page sizes correctly without downscaling text artifacts.
  • Output Format: This is a native bfloat16 precision model.
Downloads last month
19
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for loay/Arabic-OCR-DeepSeek-OCR-2

Adapter
(2)
this model

Collection including loay/Arabic-OCR-DeepSeek-OCR-2