karma689's picture
Update metrics: with_preprocess benchmark on full pages (no crop at inference)
b9771ca verified
---
license: apache-2.0
tags:
- image-classification
- tibetan
- uchen
- ume
library_name: transformers
pipeline_tag: image-classification
---
# Uchen vs Umê classifier (DINOv3 ViT-S)
Binary Tibetan script classifier: **uchen** (printed) vs **ume** (cursive).
**Dataset (splits, Parquet, inference):** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
## Training preprocess (from `config.yaml` + `train.py`)
`train.py` builds **three** dataloaders with **per-split** preprocess (`preprocess_for_split` in `common.py`):
| Split | `with_preprocess` config | Effect in `ScriptImageDataset.__getitem__` |
|-------|--------------------------|------------------------------------------|
| **train** | `train_preprocess: center_crop_whole_page` | Center crop before augment + DINO processor |
| **val** | `val_preprocess: center_crop_whole_page` | Center crop before DINO processor |
| **test** | `test_preprocess: none` | **Full page** — no crop, only DINO processor |
So high **validation** scores for `with_preprocess` (val F1 ~0.99) are on **cropped** pages. **Test** during training uses **full pages** (test F1 ~0.51). That is intentional in the code, not a bug.
**Benchmark eval must use `test_preprocess: none`** (same as the test split) unless you are deliberately measuring crop-to-crop generalization.
## Recommended weights for full manuscript pages
**`without_preprocess/final_model.pt`** — trained without runtime crop on any split.
## Results summary
**Benchmark** = 60 held-out images (30 uchen + 30 ume). **Test** = 867 images (work-stratified), full pages.
| Variant | Train/val preprocess | Test & benchmark eval preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 | Benchmark AUC |
|---------|---------------------|----------------------------------|----------|---------------|---------------|-------------------|---------------|
| **`without_preprocess/`** | none | **none** (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** | 0.970 |
| **`with_preprocess/`** | center crop | **none** (full page) | 56.1% | 0.506 | **68.3%** | **0.648** | 0.953 |
| ~~with_preprocess~~ | center crop | ~~center crop at inference~~ *(not comparable to test)* | — | — | ~~98.3%~~ | ~~0.983~~ | — |
The ~~98.3%~~ benchmark number only appears if you **center-crop at inference**, which matches **val** but **not** how the model was evaluated on **test** during training.
## Benchmark evaluation (60 images)
### Fair eval — full pages (`preprocess none`, matches `test_preprocess`)
**`without_preprocess` (recommended):**
```bash
python inference_uchen_ume.py \
--benchmark-dir benchmark \
--weights without_preprocess/final_model.pt \
--preprocess none
```
**`with_preprocess` (same protocol as training test split):**
```bash
python inference_uchen_ume.py \
--benchmark-dir benchmark \
--weights with_preprocess/final_model.pt \
--preprocess none
```
From this repo:
```bash
python experiments/uchen_ume_binary/eval_benchmark.py \
--checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
python experiments/uchen_ume_binary/eval_benchmark.py \
--checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
# default test-preprocess is none — do NOT pass center_crop for fair comparison
```
## Parquet dataset
[openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
```python
from datasets import load_dataset
bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark")
```
## Load weights
```python
from huggingface_hub import hf_hub_download
import torch
path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model")
ckpt = torch.load(path, map_location="cpu", weights_only=False)
```