oi-OCR

oi-OCR is Open Innovation AI's document-parsing tool. It extracts structured Markdown, layout, tables, and chart data from PDFs for downstream RAG ingestion, agentic workflows, and document understanding tasks.

ParseBench Results (April 2026)

Dimension Score Rank on the public leaderboard
Charts 78.48 #1 of 47
Tables 87.06 #9
Content Faithfulness 87.24 #18
Semantic Formatting 65.65 #6
Visual Grounding 68.71 #6 (tied with Reducto)
Overall (mean of 5) 77.43 #2 of 47

Evaluated on the full ParseBench-Full suite — 2,037 single-page PDFs across chart, layout, table, and text groups.

oi-OCR is #1 on the Charts dimension — ahead of LlamaParse Agentic (78.11), Reducto Agentic (73.40), Google Gemini 3 Flash Thinking High (64.79), Anthropic Opus 4.7 (55.84), and OpenAI GPT-5.5 Reasoning Medium (65.53).

On Overall, only LlamaParse Agentic ranks higher.

Structured eval data: .eval_results/parsebench.yaml.

Evaluation methodology

  • Benchmark: ParseBench-Full — 2,037 single-page PDFs from real enterprise documents (insurance, finance, government, scientific, etc.)
  • Evaluator: official parse-bench CLI
  • Scoring mode: rule-only (LLAMACLOUD_BENCH_LLM_NORMALIZATION=off) — stricter than the leaderboard's default judge mode.

Public leaderboard

Full benchmark comparison across all 47 entries: parsebench.ai

About

Open Innovation AI builds enterprise AI tools for the GCC and beyond, with first-class English and Arabic document support.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support