Apurba OCR MoE

FP8-quantized mixture-of-experts vision-language model for document layout OCR: bounding boxes, layout categories, and text extraction from images.

Serve with vLLM

vllm serve apurbaTech/apurba-ocr-moe \
  --host 0.0.0.0 \
  --port 8010 \
  --max-model-len 16384 \
  --gpu-memory-utilization 0.85 \
  --limit-mm-per-prompt '{"image": 1, "audio": 0}' \
  --mm-processor-kwargs '{"max_soft_tokens": 1120}' \
  --max-num-seqs 32 \
  --max-num-batched-tokens 8192

Recommended inference settings

Parameter	Value
`max_soft_tokens`	`1120`
`max_tokens`	`8192`
`temperature`	`0.0`
Guided JSON schema	enabled

Integration

For preprocess → vLLM call → post-process (pixel bboxes on origin image), see the Apurba OCR MoE application repository and docs/VLLM_INTEGRATION.md.

License

Subject to the license terms included with this checkpoint. Contact Apurba Tech for commercial use.

Downloads last month: 19

Safetensors

Model size

27B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support