| --- |
| tags: |
| - document-parsing |
| - ocr |
| - pdf |
| - parsebench |
| - enterprise-documents |
| license: apache-2.0 |
| language: |
| - en |
| - ar |
| --- |
| |
| # oi-OCR |
|
|
| **oi-OCR** is Open Innovation AI's document-parsing tool. It extracts structured Markdown, layout, tables, and chart data from PDFs for downstream RAG ingestion, agentic workflows, and document understanding tasks. |
|
|
| ## ParseBench Results (April 2026) |
|
|
| | Dimension | Score | Rank on the public leaderboard | |
| |---|---:|---| |
| | **Charts** | **78.48** | **#1 of 47** | |
| | Tables | 87.06 | #9 | |
| | Content Faithfulness | 87.24 | #18 | |
| | Semantic Formatting | 65.65 | #6 | |
| | Visual Grounding | 68.71 | #6 (tied with Reducto) | |
| | **Overall (mean of 5)** | **77.43** | **#2 of 47** | |
|
|
| Evaluated on the full [ParseBench-Full](https://huggingface.co/datasets/llamaindex/ParseBench) suite — 2,037 single-page PDFs across chart, layout, table, and text groups. |
|
|
| **oi-OCR is #1 on the Charts dimension** — ahead of LlamaParse Agentic (78.11), Reducto Agentic (73.40), Google Gemini 3 Flash Thinking High (64.79), Anthropic Opus 4.7 (55.84), and OpenAI GPT-5.5 Reasoning Medium (65.53). |
|
|
| On Overall, only LlamaParse Agentic ranks higher. |
|
|
| Structured eval data: [`.eval_results/parsebench.yaml`](./.eval_results/parsebench.yaml). |
|
|
| ## Evaluation methodology |
|
|
| - **Benchmark**: [ParseBench-Full](https://huggingface.co/datasets/llamaindex/ParseBench) — 2,037 single-page PDFs from real enterprise documents (insurance, finance, government, scientific, etc.) |
| - **Evaluator**: official [`parse-bench`](https://github.com/run-llama/ParseBench) CLI |
| - **Scoring mode**: rule-only (`LLAMACLOUD_BENCH_LLM_NORMALIZATION=off`) — stricter than the leaderboard's default judge mode. |
| - |
| ## Public leaderboard |
|
|
| Full benchmark comparison across all 47 entries: [parsebench.ai](https://www.parsebench.ai/) |
|
|
| ## About |
|
|
| [Open Innovation AI](https://openinnovation.ai/) builds enterprise AI tools for the GCC and beyond, with first-class English and Arabic document support. |
|
|