Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.10.0
metadata
title: Apparatus Ocr
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: gpl-3.0
short_description: 'Benchmarking OCR of critical text editions '
sdk_version: 5.43.1
tags:
- leaderboard
OCR leaderboard
This Space is customized for a two-level OCR benchmark on a single critical-edition page.
Inputs and gold outputs live under data/lloyd-jones-soph-170/:
png/lloyd-jones-fullpage.png: hard task inputpng/lloyd-jones-text.png: easy task text croppng/lloyd-jones-apparatus.png: easy task apparatus cropocr/lloyd-jones-text.json: gold main-text outputocr/lloyd-jones-apparatus.json: gold apparatus output
The leaderboard expects result files in the following format:
{
"config": {
"model_dtype": "torch.float16",
"model_name": "org/model",
"model_sha": "main"
},
"results": {
"easy_levenshtein": {
"score": 91.23
},
"easy_bleu": {
"score": 84.56
},
"hard_levenshtein": {
"score": 79.10
},
"hard_bleu": {
"score": 70.42
}
}
}
The Space is local-first:
- If HF backend datasets are configured via env vars, it will sync from them.
- Otherwise it reads seeded queue/results data from
data/leaderboard/.
Useful files:
src/about.py: task definitions and benchmark copysrc/evaluation/metrics.py: local OCR metric helperssrc/evaluation/build_result.py: CLI to turn predicted OCR JSON files into a leaderboard result JSONsrc/evaluation/run_granite_pipeline.py: end-to-end Granite Vision runner for the benchmark imagessrc/leaderboard/read_evals.py: result ingestionsrc/populate.py: leaderboard and queue dataframe assembly
Example:
python -m src.evaluation.build_result \
--model-name ibm-granite/granite-vision-3.3-2b \
--easy-text path/to/easy-text.json \
--easy-apparatus path/to/easy-apparatus.json \
--hard-text path/to/hard-text.json \
--hard-apparatus path/to/hard-apparatus.json \
--output data/leaderboard/results/ibm-granite/results_2026-03-28T00-00-00Z.json
To run the first baseline model directly:
python -m src.evaluation.run_granite_pipeline \
--model-name ibm-granite/granite-vision-3.3-2b \
--output-dir data/leaderboard/runs/granite-vision-3.3-2b
That command writes:
- predicted OCR JSON files for easy and hard tasks
- raw model responses for debugging
result.jsonin leaderboard formatsummary.jsonwith the four benchmark scores