DeepSeek-OCR Fine-tuned — OCR
One-command setup to serve our fine-tuned DeepSeek-OCR model with LoRA adapters via FastAPI.
Two serving backends are available:
| Backend | Script | Inference | Best for |
|---|---|---|---|
| Unsloth | setup_and_serve.sh |
~27s/image | Compatibility, training workflows |
| vLLM | setup_and_serve_vllm.sh |
~2-3s/image | Production, low latency |
Quick Start — vLLM (recommended)
git clone https://huggingface.co/shubhamingale/deepseek-ocr2-3b-lora
cd deepseek-ocr2-3b-lora
bash setup_and_serve_vllm.sh
This will:
- Install all dependencies (vLLM, FastAPI, etc.)
- Download the base model (
unsloth/DeepSeek-OCR~6.7GB) - Download the LoRA adapters from HuggingFace (~296MB)
- Start a FastAPI server on port 8000 with vLLM inference
For best performance, merge the LoRA weights before serving:
MERGE_LORA=1 bash setup_and_serve_vllm.sh
Quick Start — Unsloth (original)
git clone https://huggingface.co/shubhamingale/deepseek-ocr2-3b-lora
cd deepseek-ocr2-3b-lora
bash setup_and_serve.sh
API Usage
Both backends expose identical endpoints.
Upload an image file
curl -X POST http://localhost:8000/ocr \
-F "file=@your_receipt.jpg"
Send a base64 image
curl -X POST http://localhost:8000/ocr/base64 \
-F "image_base64=$(base64 -w0 your_receipt.jpg)"
Health check
curl http://localhost:8000/health
Response format
{
"text": "extracted text from the image...",
"status": "success"
}
Configuration
Shared
| Variable | Default | Description |
|---|---|---|
LORA_REPO |
shubhamingale/deepseek-ocr2-3b-lora |
HF repo with LoRA adapters |
PORT |
8000 |
Server port |
IMAGE_SIZE |
320 (vLLM) / 640 (Unsloth) |
Max image dimension |
BASE_SIZE |
512 (vLLM) / 1024 (Unsloth) |
Base image size |
HF_TOKEN |
— | HF token (if adapter repo is private) |
vLLM-only
| Variable | Default | Description |
|---|---|---|
MERGE_LORA |
0 |
Set to 1 to merge LoRA weights before serving |
MAX_MODEL_LEN |
4096 |
Max sequence length |
GPU_MEMORY_UTILIZATION |
0.90 |
Fraction of GPU memory to use |
MAX_LORA_RANK |
16 |
Max LoRA rank |
MAX_TOKENS |
2048 |
Max output tokens per request |
TEMPERATURE |
0.0 |
Sampling temperature |
DTYPE |
half |
Model dtype (half, bfloat16, auto) |
Tuning image resolution
Lower resolution = faster inference, but may reduce OCR quality on small text. Experiment:
# Fast (default for vLLM)
IMAGE_SIZE=320 BASE_SIZE=512 python serve_vllm.py
# Balanced
IMAGE_SIZE=480 BASE_SIZE=768 python serve_vllm.py
# High quality (original Unsloth defaults)
IMAGE_SIZE=640 BASE_SIZE=1024 python serve_vllm.py
Requirements
- Python 3.10+
- NVIDIA GPU with >=16GB VRAM
- CUDA 11.8+
Model Details
- Base model: unsloth/DeepSeek-OCR (3B params)
- Fine-tuned with: Unsloth + LoRA (r=16, alpha=16)
- LoRA adapters: shubhamingale/deepseek-ocr2-3b-lora
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support