Open Arabic Vision, Layout & OCR Models by Loay
Collection
This collection hosts a series of Vision Language Models (VLMs) fine-tuned for Arabic Optical Character Recognition (OCR) and Document Processing. • 2 items • Updated
This repository contains the bfloat16 merged version of the DeepSeek-OCR-2 (3B) Model, fine-tuned by loay for the specific task of performing high-precision Optical Character Recognition (OCR) and structural layout analysis on Arabic text from images.
The model was created by fine-tuning the unsloth/DeepSeek-OCR-2 model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.
unsloth/DeepSeek-OCR-2bfloat16 precision and Flash Attention 2 for optimal quality and speed.r=64 and lora_alpha=128. The targeted modules include ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"].crop_mode=True) to handle varying large page sizes correctly without downscaling text artifacts.bfloat16 precision model.