Apurba OCR MoE

FP8-quantized mixture-of-experts vision-language model for document layout OCR: bounding boxes, layout categories, and text extraction from images.

Serve with vLLM

vllm serve apurbaTech/apurba-ocr-moe \
  --host 0.0.0.0 \
  --port 8010 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.85 \
  --limit-mm-per-prompt '{"image": 1, "audio": 0}' \
  --mm-processor-kwargs '{"max_soft_tokens": 1120}' \
  --max-num-seqs 32 \
  --max-num-batched-tokens 8192

Recommended inference settings

Parameter Value
max_soft_tokens 1120
max_tokens 8192
temperature 0.0
Guided JSON schema enabled

Integration

For preprocess → vLLM call → post-process (pixel bboxes on origin image), see the Apurba OCR MoE application repository and docs/VLLM_INTEGRATION.md.

License

Subject to the license terms included with this checkpoint. Contact Apurba Tech for commercial use.

Downloads last month
19
Safetensors
Model size
27B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support