--- license: apache-2.0 tags: - image-classification - tibetan - uchen - ume library_name: transformers pipeline_tag: image-classification --- # Uchen vs Umê classifier (DINOv3 ViT-S) Binary Tibetan script classifier: **uchen** (printed) vs **ume** (cursive). **Dataset (splits, Parquet, inference):** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark) ## Training preprocess (from `config.yaml` + `train.py`) `train.py` builds **three** dataloaders with **per-split** preprocess (`preprocess_for_split` in `common.py`): | Split | `with_preprocess` config | Effect in `ScriptImageDataset.__getitem__` | |-------|--------------------------|------------------------------------------| | **train** | `train_preprocess: center_crop_whole_page` | Center crop before augment + DINO processor | | **val** | `val_preprocess: center_crop_whole_page` | Center crop before DINO processor | | **test** | `test_preprocess: none` | **Full page** — no crop, only DINO processor | So high **validation** scores for `with_preprocess` (val F1 ~0.99) are on **cropped** pages. **Test** during training uses **full pages** (test F1 ~0.51). That is intentional in the code, not a bug. **Benchmark eval must use `test_preprocess: none`** (same as the test split) unless you are deliberately measuring crop-to-crop generalization. ## Recommended weights for full manuscript pages **`without_preprocess/final_model.pt`** — trained without runtime crop on any split. ## Results summary **Benchmark** = 60 held-out images (30 uchen + 30 ume). **Test** = 867 images (work-stratified), full pages. | Variant | Train/val preprocess | Test & benchmark eval preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 | Benchmark AUC | |---------|---------------------|----------------------------------|----------|---------------|---------------|-------------------|---------------| | **`without_preprocess/`** | none | **none** (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** | 0.970 | | **`with_preprocess/`** | center crop | **none** (full page) | 56.1% | 0.506 | **68.3%** | **0.648** | 0.953 | | ~~with_preprocess~~ | center crop | ~~center crop at inference~~ *(not comparable to test)* | — | — | ~~98.3%~~ | ~~0.983~~ | — | The ~~98.3%~~ benchmark number only appears if you **center-crop at inference**, which matches **val** but **not** how the model was evaluated on **test** during training. ## Benchmark evaluation (60 images) ### Fair eval — full pages (`preprocess none`, matches `test_preprocess`) **`without_preprocess` (recommended):** ```bash python inference_uchen_ume.py \ --benchmark-dir benchmark \ --weights without_preprocess/final_model.pt \ --preprocess none ``` **`with_preprocess` (same protocol as training test split):** ```bash python inference_uchen_ume.py \ --benchmark-dir benchmark \ --weights with_preprocess/final_model.pt \ --preprocess none ``` From this repo: ```bash python experiments/uchen_ume_binary/eval_benchmark.py \ --checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark python experiments/uchen_ume_binary/eval_benchmark.py \ --checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark # default test-preprocess is none — do NOT pass center_crop for fair comparison ``` ## Parquet dataset [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark) ```python from datasets import load_dataset bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark") ``` ## Load weights ```python from huggingface_hub import hf_hub_download import torch path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model") ckpt = torch.load(path, map_location="cpu", weights_only=False) ```