apparatus-ocr / README.md
al1808th's picture
script
e34d289
---
title: Apparatus Ocr
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: gpl-3.0
short_description: 'Benchmarking OCR of critical text editions '
sdk_version: 5.43.1
tags:
- leaderboard
---
# OCR leaderboard
This Space is customized for a two-level OCR benchmark on a single critical-edition page.
Inputs and gold outputs live under `data/lloyd-jones-soph-170/`:
- `png/lloyd-jones-fullpage.png`: hard task input
- `png/lloyd-jones-text.png`: easy task text crop
- `png/lloyd-jones-apparatus.png`: easy task apparatus crop
- `ocr/lloyd-jones-text.json`: gold main-text output
- `ocr/lloyd-jones-apparatus.json`: gold apparatus output
The leaderboard expects result files in the following format:
```json
{
"config": {
"model_dtype": "torch.float16",
"model_name": "org/model",
"model_sha": "main"
},
"results": {
"easy_levenshtein": {
"score": 91.23
},
"easy_bleu": {
"score": 84.56
},
"hard_levenshtein": {
"score": 79.10
},
"hard_bleu": {
"score": 70.42
}
}
}
```
The Space is local-first:
- If HF backend datasets are configured via env vars, it will sync from them.
- Otherwise it reads seeded queue/results data from `data/leaderboard/`.
Useful files:
- `src/about.py`: task definitions and benchmark copy
- `src/evaluation/metrics.py`: local OCR metric helpers
- `src/evaluation/build_result.py`: CLI to turn predicted OCR JSON files into a leaderboard result JSON
- `src/evaluation/run_granite_pipeline.py`: end-to-end Granite Vision runner for the benchmark images
- `src/leaderboard/read_evals.py`: result ingestion
- `src/populate.py`: leaderboard and queue dataframe assembly
Example:
```bash
python -m src.evaluation.build_result \
--model-name ibm-granite/granite-vision-3.3-2b \
--easy-text path/to/easy-text.json \
--easy-apparatus path/to/easy-apparatus.json \
--hard-text path/to/hard-text.json \
--hard-apparatus path/to/hard-apparatus.json \
--output data/leaderboard/results/ibm-granite/results_2026-03-28T00-00-00Z.json
```
To run the first baseline model directly:
```bash
python -m src.evaluation.run_granite_pipeline \
--model-name ibm-granite/granite-vision-3.3-2b \
--output-dir data/leaderboard/runs/granite-vision-3.3-2b
```
That command writes:
- predicted OCR JSON files for easy and hard tasks
- raw model responses for debugging
- `result.json` in leaderboard format
- `summary.json` with the four benchmark scores