karma689's picture
Update metrics: with_preprocess benchmark on full pages (no crop at inference)
b9771ca verified
metadata
license: apache-2.0
tags:
  - image-classification
  - tibetan
  - uchen
  - ume
library_name: transformers
pipeline_tag: image-classification

Uchen vs Umê classifier (DINOv3 ViT-S)

Binary Tibetan script classifier: uchen (printed) vs ume (cursive).

Dataset (splits, Parquet, inference): openpecha/uchen-ume-classification-benchmark

Training preprocess (from config.yaml + train.py)

train.py builds three dataloaders with per-split preprocess (preprocess_for_split in common.py):

Split with_preprocess config Effect in ScriptImageDataset.__getitem__
train train_preprocess: center_crop_whole_page Center crop before augment + DINO processor
val val_preprocess: center_crop_whole_page Center crop before DINO processor
test test_preprocess: none Full page — no crop, only DINO processor

So high validation scores for with_preprocess (val F1 ~0.99) are on cropped pages. Test during training uses full pages (test F1 ~0.51). That is intentional in the code, not a bug.

Benchmark eval must use test_preprocess: none (same as the test split) unless you are deliberately measuring crop-to-crop generalization.

Recommended weights for full manuscript pages

without_preprocess/final_model.pt — trained without runtime crop on any split.

Results summary

Benchmark = 60 held-out images (30 uchen + 30 ume). Test = 867 images (work-stratified), full pages.

Variant Train/val preprocess Test & benchmark eval preprocess Test acc Test macro-F1 Benchmark acc Benchmark macro-F1 Benchmark AUC
without_preprocess/ none none (full page) 80.7% 0.708 85.0% 0.848 0.970
with_preprocess/ center crop none (full page) 56.1% 0.506 68.3% 0.648 0.953
with_preprocess center crop center crop at inference (not comparable to test) 98.3% 0.983

The 98.3% benchmark number only appears if you center-crop at inference, which matches val but not how the model was evaluated on test during training.

Benchmark evaluation (60 images)

Fair eval — full pages (preprocess none, matches test_preprocess)

without_preprocess (recommended):

python inference_uchen_ume.py \
  --benchmark-dir benchmark \
  --weights without_preprocess/final_model.pt \
  --preprocess none

with_preprocess (same protocol as training test split):

python inference_uchen_ume.py \
  --benchmark-dir benchmark \
  --weights with_preprocess/final_model.pt \
  --preprocess none

From this repo:

python experiments/uchen_ume_binary/eval_benchmark.py \
  --checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark

python experiments/uchen_ume_binary/eval_benchmark.py \
  --checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
# default test-preprocess is none — do NOT pass center_crop for fair comparison

Parquet dataset

openpecha/uchen-ume-classification-benchmark

from datasets import load_dataset
bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark")

Load weights

from huggingface_hub import hf_hub_download
import torch
path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model")
ckpt = torch.load(path, map_location="cpu", weights_only=False)