Apurba OCR MoE
FP8-quantized mixture-of-experts vision-language model for document layout OCR: bounding boxes, layout categories, and text extraction from images.
Serve with vLLM
vllm serve apurbaTech/apurba-ocr-moe \
--host 0.0.0.0 \
--port 8010 \
--max-model-len 16384 \
--gpu-memory-utilization 0.85 \
--limit-mm-per-prompt '{"image": 1, "audio": 0}' \
--mm-processor-kwargs '{"max_soft_tokens": 1120}' \
--max-num-seqs 32 \
--max-num-batched-tokens 8192
Recommended inference settings
| Parameter | Value |
|---|---|
max_soft_tokens |
1120 |
max_tokens |
8192 |
temperature |
0.0 |
| Guided JSON schema | enabled |
Integration
For preprocess → vLLM call → post-process (pixel bboxes on origin image), see the Apurba OCR MoE application repository and docs/VLLM_INTEGRATION.md.
License
Subject to the license terms included with this checkpoint. Contact Apurba Tech for commercial use.
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support