--- title: Apparatus Ocr emoji: 🥇 colorFrom: green colorTo: indigo sdk: gradio app_file: app.py pinned: true license: gpl-3.0 short_description: 'Benchmarking OCR of critical text editions ' sdk_version: 5.43.1 tags: - leaderboard --- # OCR leaderboard This Space is customized for a two-level OCR benchmark on a single critical-edition page. Inputs and gold outputs live under `data/lloyd-jones-soph-170/`: - `png/lloyd-jones-fullpage.png`: hard task input - `png/lloyd-jones-text.png`: easy task text crop - `png/lloyd-jones-apparatus.png`: easy task apparatus crop - `ocr/lloyd-jones-text.json`: gold main-text output - `ocr/lloyd-jones-apparatus.json`: gold apparatus output The leaderboard expects result files in the following format: ```json { "config": { "model_dtype": "torch.float16", "model_name": "org/model", "model_sha": "main" }, "results": { "easy_levenshtein": { "score": 91.23 }, "easy_bleu": { "score": 84.56 }, "hard_levenshtein": { "score": 79.10 }, "hard_bleu": { "score": 70.42 } } } ``` The Space is local-first: - If HF backend datasets are configured via env vars, it will sync from them. - Otherwise it reads seeded queue/results data from `data/leaderboard/`. Useful files: - `src/about.py`: task definitions and benchmark copy - `src/evaluation/metrics.py`: local OCR metric helpers - `src/evaluation/build_result.py`: CLI to turn predicted OCR JSON files into a leaderboard result JSON - `src/evaluation/run_granite_pipeline.py`: end-to-end Granite Vision runner for the benchmark images - `src/leaderboard/read_evals.py`: result ingestion - `src/populate.py`: leaderboard and queue dataframe assembly Example: ```bash python -m src.evaluation.build_result \ --model-name ibm-granite/granite-vision-3.3-2b \ --easy-text path/to/easy-text.json \ --easy-apparatus path/to/easy-apparatus.json \ --hard-text path/to/hard-text.json \ --hard-apparatus path/to/hard-apparatus.json \ --output data/leaderboard/results/ibm-granite/results_2026-03-28T00-00-00Z.json ``` To run the first baseline model directly: ```bash python -m src.evaluation.run_granite_pipeline \ --model-name ibm-granite/granite-vision-3.3-2b \ --output-dir data/leaderboard/runs/granite-vision-3.3-2b ``` That command writes: - predicted OCR JSON files for easy and hard tasks - raw model responses for debugging - `result.json` in leaderboard format - `summary.json` with the four benchmark scores