DeepSeek-OCR Fine-tuned — OCR

One-command setup to serve our fine-tuned DeepSeek-OCR model with LoRA adapters via FastAPI.

Two serving backends are available:

Backend	Script	Inference	Best for
Unsloth	`setup_and_serve.sh`	~27s/image	Compatibility, training workflows
vLLM	`setup_and_serve_vllm.sh`	~2-3s/image	Production, low latency

Quick Start — vLLM (recommended)

git clone https://huggingface.co/shubhamingale/deepseek-ocr2-3b-lora
cd deepseek-ocr2-3b-lora
MERGE_LORA=1 bash setup_and_serve_vllm.sh

This will:

Install all dependencies (vLLM, FastAPI, etc.)
Download the base model (unsloth/DeepSeek-OCR ~6.7GB)
Download the LoRA adapters from HuggingFace (~296MB)
Start a FastAPI server on port 8000 with vLLM inference

For best performance, merge the LoRA weights before serving:

MERGE_LORA=1 bash setup_and_serve_vllm.sh

Quick Start — Unsloth (original)

git clone https://huggingface.co/shubhamingale/deepseek-ocr2-3b-lora
cd deepseek-ocr2-3b-lora
bash setup_and_serve.sh

API Usage

Both backends expose identical endpoints.

Upload an image file

curl -X POST http://localhost:8000/ocr \
  -F "file=@your_receipt.jpg"

Send a base64 image

curl -X POST http://localhost:8000/ocr/base64 \
  -F "image_base64=$(base64 -w0 your_receipt.jpg)"

Health check

curl http://localhost:8000/health

Response format

{
  "text": "extracted text from the image...",
  "status": "success"
}

Configuration

Shared

Variable	Default	Description
`LORA_REPO`	`shubhamingale/deepseek-ocr2-3b-lora`	HF repo with LoRA adapters
`PORT`	`8000`	Server port
`IMAGE_SIZE`	`320` (vLLM) / `640` (Unsloth)	Max image dimension
`BASE_SIZE`	`512` (vLLM) / `1024` (Unsloth)	Base image size
`HF_TOKEN`	—	HF token (if adapter repo is private)

vLLM-only

Variable	Default	Description
`MERGE_LORA`	`0`	Set to `1` to merge LoRA weights before serving
`MAX_MODEL_LEN`	`4096`	Max sequence length
`GPU_MEMORY_UTILIZATION`	`0.90`	Fraction of GPU memory to use
`MAX_LORA_RANK`	`16`	Max LoRA rank
`MAX_TOKENS`	`2048`	Max output tokens per request
`TEMPERATURE`	`0.0`	Sampling temperature
`DTYPE`	`half`	Model dtype (`half`, `bfloat16`, `auto`)

Tuning image resolution

Lower resolution = faster inference, but may reduce OCR quality on small text. Experiment:

# Fast (default for vLLM)
IMAGE_SIZE=320 BASE_SIZE=512 python serve_vllm.py

# Balanced
IMAGE_SIZE=480 BASE_SIZE=768 python serve_vllm.py

# High quality (original Unsloth defaults)
IMAGE_SIZE=640 BASE_SIZE=1024 python serve_vllm.py

Requirements

Python 3.10+
NVIDIA GPU with >=16GB VRAM
CUDA 11.8+

Model Details

Base model: unsloth/DeepSeek-OCR (3B params)
Fine-tuned with: Unsloth + LoRA (r=16, alpha=16)
LoRA adapters: shubhamingale/deepseek-ocr2-3b-lora

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shubhamingale/deepseek-ocr2-3b-lora

Base model

deepseek-ai/DeepSeek-OCR

Finetuned

unsloth/DeepSeek-OCR

Adapter

(4)

this model