Spaces:

DigPhil
/

apparatus-ocr

Runtime error

App Files Files Community

apparatus-ocr / README.md

al1808th

script

e34d289 2 days ago

preview code

raw

history blame contribute delete

2.57 kB

	---
	title: Apparatus Ocr
	emoji: 🥇
	colorFrom: green
	colorTo: indigo
	sdk: gradio
	app_file: app.py
	pinned: true
	license: gpl-3.0
	short_description: 'Benchmarking OCR of critical text editions '
	sdk_version: 5.43.1
	tags:
	- leaderboard
	---

	# OCR leaderboard

	This Space is customized for a two-level OCR benchmark on a single critical-edition page.

	Inputs and gold outputs live under `data/lloyd-jones-soph-170/`:
	- `png/lloyd-jones-fullpage.png`: hard task input
	- `png/lloyd-jones-text.png`: easy task text crop
	- `png/lloyd-jones-apparatus.png`: easy task apparatus crop
	- `ocr/lloyd-jones-text.json`: gold main-text output
	- `ocr/lloyd-jones-apparatus.json`: gold apparatus output

	The leaderboard expects result files in the following format:
	```json
	{
	"config": {
	"model_dtype": "torch.float16",
	"model_name": "org/model",
	"model_sha": "main"
	},
	"results": {
	"easy_levenshtein": {
	"score": 91.23
	},
	"easy_bleu": {
	"score": 84.56
	},
	"hard_levenshtein": {
	"score": 79.10
	},
	"hard_bleu": {
	"score": 70.42
	}
	}
	}
	```

	The Space is local-first:
	- If HF backend datasets are configured via env vars, it will sync from them.
	- Otherwise it reads seeded queue/results data from `data/leaderboard/`.

	Useful files:
	- `src/about.py`: task definitions and benchmark copy
	- `src/evaluation/metrics.py`: local OCR metric helpers
	- `src/evaluation/build_result.py`: CLI to turn predicted OCR JSON files into a leaderboard result JSON
	- `src/evaluation/run_granite_pipeline.py`: end-to-end Granite Vision runner for the benchmark images
	- `src/leaderboard/read_evals.py`: result ingestion
	- `src/populate.py`: leaderboard and queue dataframe assembly

	Example:
	```bash
	python -m src.evaluation.build_result \
	--model-name ibm-granite/granite-vision-3.3-2b \
	--easy-text path/to/easy-text.json \
	--easy-apparatus path/to/easy-apparatus.json \
	--hard-text path/to/hard-text.json \
	--hard-apparatus path/to/hard-apparatus.json \
	--output data/leaderboard/results/ibm-granite/results_2026-03-28T00-00-00Z.json
	```

	To run the first baseline model directly:
	```bash
	python -m src.evaluation.run_granite_pipeline \
	--model-name ibm-granite/granite-vision-3.3-2b \
	--output-dir data/leaderboard/runs/granite-vision-3.3-2b
	```

	That command writes:
	- predicted OCR JSON files for easy and hard tasks
	- raw model responses for debugging
	- `result.json` in leaderboard format
	- `summary.json` with the four benchmark scores