Spaces:
Runtime error
Runtime error
| title: Apparatus Ocr | |
| emoji: 🥇 | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: true | |
| license: gpl-3.0 | |
| short_description: 'Benchmarking OCR of critical text editions ' | |
| sdk_version: 5.43.1 | |
| tags: | |
| - leaderboard | |
| # OCR leaderboard | |
| This Space is customized for a two-level OCR benchmark on a single critical-edition page. | |
| Inputs and gold outputs live under `data/lloyd-jones-soph-170/`: | |
| - `png/lloyd-jones-fullpage.png`: hard task input | |
| - `png/lloyd-jones-text.png`: easy task text crop | |
| - `png/lloyd-jones-apparatus.png`: easy task apparatus crop | |
| - `ocr/lloyd-jones-text.json`: gold main-text output | |
| - `ocr/lloyd-jones-apparatus.json`: gold apparatus output | |
| The leaderboard expects result files in the following format: | |
| ```json | |
| { | |
| "config": { | |
| "model_dtype": "torch.float16", | |
| "model_name": "org/model", | |
| "model_sha": "main" | |
| }, | |
| "results": { | |
| "easy_levenshtein": { | |
| "score": 91.23 | |
| }, | |
| "easy_bleu": { | |
| "score": 84.56 | |
| }, | |
| "hard_levenshtein": { | |
| "score": 79.10 | |
| }, | |
| "hard_bleu": { | |
| "score": 70.42 | |
| } | |
| } | |
| } | |
| ``` | |
| The Space is local-first: | |
| - If HF backend datasets are configured via env vars, it will sync from them. | |
| - Otherwise it reads seeded queue/results data from `data/leaderboard/`. | |
| Useful files: | |
| - `src/about.py`: task definitions and benchmark copy | |
| - `src/evaluation/metrics.py`: local OCR metric helpers | |
| - `src/evaluation/build_result.py`: CLI to turn predicted OCR JSON files into a leaderboard result JSON | |
| - `src/evaluation/run_granite_pipeline.py`: end-to-end Granite Vision runner for the benchmark images | |
| - `src/leaderboard/read_evals.py`: result ingestion | |
| - `src/populate.py`: leaderboard and queue dataframe assembly | |
| Example: | |
| ```bash | |
| python -m src.evaluation.build_result \ | |
| --model-name ibm-granite/granite-vision-3.3-2b \ | |
| --easy-text path/to/easy-text.json \ | |
| --easy-apparatus path/to/easy-apparatus.json \ | |
| --hard-text path/to/hard-text.json \ | |
| --hard-apparatus path/to/hard-apparatus.json \ | |
| --output data/leaderboard/results/ibm-granite/results_2026-03-28T00-00-00Z.json | |
| ``` | |
| To run the first baseline model directly: | |
| ```bash | |
| python -m src.evaluation.run_granite_pipeline \ | |
| --model-name ibm-granite/granite-vision-3.3-2b \ | |
| --output-dir data/leaderboard/runs/granite-vision-3.3-2b | |
| ``` | |
| That command writes: | |
| - predicted OCR JSON files for easy and hard tasks | |
| - raw model responses for debugging | |
| - `result.json` in leaderboard format | |
| - `summary.json` with the four benchmark scores | |