DeepSeek-OCR Fine-tuned — OCR

One-command setup to serve our fine-tuned DeepSeek-OCR model with LoRA adapters via FastAPI.

Two serving backends are available:

Backend Script Inference Best for
Unsloth setup_and_serve.sh ~27s/image Compatibility, training workflows
vLLM setup_and_serve_vllm.sh ~2-3s/image Production, low latency

Quick Start — vLLM (recommended)

git clone https://huggingface.co/shubhamingale/deepseek-ocr2-3b-lora
cd deepseek-ocr2-3b-lora
bash setup_and_serve_vllm.sh

This will:

  1. Install all dependencies (vLLM, FastAPI, etc.)
  2. Download the base model (unsloth/DeepSeek-OCR ~6.7GB)
  3. Download the LoRA adapters from HuggingFace (~296MB)
  4. Start a FastAPI server on port 8000 with vLLM inference

For best performance, merge the LoRA weights before serving:

MERGE_LORA=1 bash setup_and_serve_vllm.sh

Quick Start — Unsloth (original)

git clone https://huggingface.co/shubhamingale/deepseek-ocr2-3b-lora
cd deepseek-ocr2-3b-lora
bash setup_and_serve.sh

API Usage

Both backends expose identical endpoints.

Upload an image file

curl -X POST http://localhost:8000/ocr \
  -F "file=@your_receipt.jpg"

Send a base64 image

curl -X POST http://localhost:8000/ocr/base64 \
  -F "image_base64=$(base64 -w0 your_receipt.jpg)"

Health check

curl http://localhost:8000/health

Response format

{
  "text": "extracted text from the image...",
  "status": "success"
}

Configuration

Shared

Variable Default Description
LORA_REPO shubhamingale/deepseek-ocr2-3b-lora HF repo with LoRA adapters
PORT 8000 Server port
IMAGE_SIZE 320 (vLLM) / 640 (Unsloth) Max image dimension
BASE_SIZE 512 (vLLM) / 1024 (Unsloth) Base image size
HF_TOKEN HF token (if adapter repo is private)

vLLM-only

Variable Default Description
MERGE_LORA 0 Set to 1 to merge LoRA weights before serving
MAX_MODEL_LEN 4096 Max sequence length
GPU_MEMORY_UTILIZATION 0.90 Fraction of GPU memory to use
MAX_LORA_RANK 16 Max LoRA rank
MAX_TOKENS 2048 Max output tokens per request
TEMPERATURE 0.0 Sampling temperature
DTYPE half Model dtype (half, bfloat16, auto)

Tuning image resolution

Lower resolution = faster inference, but may reduce OCR quality on small text. Experiment:

# Fast (default for vLLM)
IMAGE_SIZE=320 BASE_SIZE=512 python serve_vllm.py

# Balanced
IMAGE_SIZE=480 BASE_SIZE=768 python serve_vllm.py

# High quality (original Unsloth defaults)
IMAGE_SIZE=640 BASE_SIZE=1024 python serve_vllm.py

Requirements

  • Python 3.10+
  • NVIDIA GPU with >=16GB VRAM
  • CUDA 11.8+

Model Details

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shubhamingale/deepseek-ocr2-3b-lora

Adapter
(3)
this model