Open Arabic Vision, Layout & OCR Models by Loay
Collection
This collection hosts a series of Vision Language Models (VLMs) fine-tuned for Arabic Optical Character Recognition (OCR) and Document Processing. • 2 items • Updated
This repository contains the float16 merged version of a Vision-Language Model (VLM), fine-tuned by loay for the specific task of performing Optical Character Recognition (OCR) on Arabic text from images.
The model was created by fine-tuning the unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.
unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bitfloat16 precision model, ideal for inference on GPUs with sufficient VRAM (requires >14GB).Base model
Qwen/Qwen2.5-VL-7B-Instruct