apparatus-ocr / README.md
al1808th's picture
script
e34d289

A newer version of the Gradio SDK is available: 6.10.0

Upgrade
metadata
title: Apparatus Ocr
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: gpl-3.0
short_description: 'Benchmarking OCR of critical text editions '
sdk_version: 5.43.1
tags:
  - leaderboard

OCR leaderboard

This Space is customized for a two-level OCR benchmark on a single critical-edition page.

Inputs and gold outputs live under data/lloyd-jones-soph-170/:

  • png/lloyd-jones-fullpage.png: hard task input
  • png/lloyd-jones-text.png: easy task text crop
  • png/lloyd-jones-apparatus.png: easy task apparatus crop
  • ocr/lloyd-jones-text.json: gold main-text output
  • ocr/lloyd-jones-apparatus.json: gold apparatus output

The leaderboard expects result files in the following format:

{
    "config": {
        "model_dtype": "torch.float16",
        "model_name": "org/model",
        "model_sha": "main"
    },
    "results": {
        "easy_levenshtein": {
            "score": 91.23
        },
        "easy_bleu": {
            "score": 84.56
        },
        "hard_levenshtein": {
            "score": 79.10
        },
        "hard_bleu": {
            "score": 70.42
        }
    }
}

The Space is local-first:

  • If HF backend datasets are configured via env vars, it will sync from them.
  • Otherwise it reads seeded queue/results data from data/leaderboard/.

Useful files:

  • src/about.py: task definitions and benchmark copy
  • src/evaluation/metrics.py: local OCR metric helpers
  • src/evaluation/build_result.py: CLI to turn predicted OCR JSON files into a leaderboard result JSON
  • src/evaluation/run_granite_pipeline.py: end-to-end Granite Vision runner for the benchmark images
  • src/leaderboard/read_evals.py: result ingestion
  • src/populate.py: leaderboard and queue dataframe assembly

Example:

python -m src.evaluation.build_result \
  --model-name ibm-granite/granite-vision-3.3-2b \
  --easy-text path/to/easy-text.json \
  --easy-apparatus path/to/easy-apparatus.json \
  --hard-text path/to/hard-text.json \
  --hard-apparatus path/to/hard-apparatus.json \
  --output data/leaderboard/results/ibm-granite/results_2026-03-28T00-00-00Z.json

To run the first baseline model directly:

python -m src.evaluation.run_granite_pipeline \
  --model-name ibm-granite/granite-vision-3.3-2b \
  --output-dir data/leaderboard/runs/granite-vision-3.3-2b

That command writes:

  • predicted OCR JSON files for easy and hard tasks
  • raw model responses for debugging
  • result.json in leaderboard format
  • summary.json with the four benchmark scores