oi-uae
/

oi-OCR

document-parsing

enterprise-documents

Model card Files Files and versions

oi-OCR / README.md

Thiago-cs's picture

Update README.md

ad20e64 verified 13 days ago

|

history blame contribute delete

1.99 kB

	---
	tags:
	- document-parsing
	- ocr
	- pdf
	- parsebench
	- enterprise-documents
	license: apache-2.0
	language:
	- en
	- ar
	---

	# oi-OCR

	oi-OCR is Open Innovation AI's document-parsing tool. It extracts structured Markdown, layout, tables, and chart data from PDFs for downstream RAG ingestion, agentic workflows, and document understanding tasks.

	## ParseBench Results (April 2026)

	\| Dimension \| Score \| Rank on the public leaderboard \|
	\|---\|---:\|---\|
	\| Charts \| 78.48 \| #1 of 47 \|
	\| Tables \| 87.06 \| #9 \|
	\| Content Faithfulness \| 87.24 \| #18 \|
	\| Semantic Formatting \| 65.65 \| #6 \|
	\| Visual Grounding \| 68.71 \| #6 (tied with Reducto) \|
	\| Overall (mean of 5) \| 77.43 \| #2 of 47 \|

	Evaluated on the full [ParseBench-Full](https://huggingface.co/datasets/llamaindex/ParseBench) suite — 2,037 single-page PDFs across chart, layout, table, and text groups.

	oi-OCR is #1 on the Charts dimension — ahead of LlamaParse Agentic (78.11), Reducto Agentic (73.40), Google Gemini 3 Flash Thinking High (64.79), Anthropic Opus 4.7 (55.84), and OpenAI GPT-5.5 Reasoning Medium (65.53).

	On Overall, only LlamaParse Agentic ranks higher.

	Structured eval data: [`.eval_results/parsebench.yaml`](./.eval_results/parsebench.yaml).

	## Evaluation methodology

	- Benchmark: [ParseBench-Full](https://huggingface.co/datasets/llamaindex/ParseBench) — 2,037 single-page PDFs from real enterprise documents (insurance, finance, government, scientific, etc.)
	- Evaluator: official [`parse-bench`](https://github.com/run-llama/ParseBench) CLI
	- Scoring mode: rule-only (`LLAMACLOUD_BENCH_LLM_NORMALIZATION=off`) — stricter than the leaderboard's default judge mode.
	-
	## Public leaderboard

	Full benchmark comparison across all 47 entries: [parsebench.ai](https://www.parsebench.ai/)

	## About

	[Open Innovation AI](https://openinnovation.ai/) builds enterprise AI tools for the GCC and beyond, with first-class English and Arabic document support.