---
license: apache-2.0
tags:
  - image-classification
  - tibetan
  - uchen
  - ume
library_name: transformers
pipeline_tag: image-classification
---

# Uchen vs Umê classifier (DINOv3 ViT-S)

Binary Tibetan script classifier: **uchen** (printed) vs **ume** (cursive).

**Dataset (splits, Parquet, inference):** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)

## Training preprocess (from `config.yaml` + `train.py`)

`train.py` builds **three** dataloaders with **per-split** preprocess (`preprocess_for_split` in `common.py`):

| Split | `with_preprocess` config | Effect in `ScriptImageDataset.__getitem__` |
|-------|--------------------------|------------------------------------------|
| **train** | `train_preprocess: center_crop_whole_page` | Center crop before augment + DINO processor |
| **val** | `val_preprocess: center_crop_whole_page` | Center crop before DINO processor |
| **test** | `test_preprocess: none` | **Full page** — no crop, only DINO processor |

So high **validation** scores for `with_preprocess` (val F1 ~0.99) are on **cropped** pages. **Test** during training uses **full pages** (test F1 ~0.51). That is intentional in the code, not a bug.

**Benchmark eval must use `test_preprocess: none`** (same as the test split) unless you are deliberately measuring crop-to-crop generalization.

## Recommended weights for full manuscript pages

**`without_preprocess/final_model.pt`** — trained without runtime crop on any split.

## Results summary

**Benchmark** = 60 held-out images (30 uchen + 30 ume). **Test** = 867 images (work-stratified), full pages.

| Variant | Train/val preprocess | Test & benchmark eval preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 | Benchmark AUC |
|---------|---------------------|----------------------------------|----------|---------------|---------------|-------------------|---------------|
| **`without_preprocess/`** | none | **none** (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** | 0.970 |
| **`with_preprocess/`** | center crop | **none** (full page) | 56.1% | 0.506 | **68.3%** | **0.648** | 0.953 |
| ~~with_preprocess~~ | center crop | ~~center crop at inference~~ *(not comparable to test)* | — | — | ~~98.3%~~ | ~~0.983~~ | — |

The ~~98.3%~~ benchmark number only appears if you **center-crop at inference**, which matches **val** but **not** how the model was evaluated on **test** during training.

## Benchmark evaluation (60 images)

### Fair eval — full pages (`preprocess none`, matches `test_preprocess`)

**`without_preprocess` (recommended):**

```bash
python inference_uchen_ume.py \
  --benchmark-dir benchmark \
  --weights without_preprocess/final_model.pt \
  --preprocess none
```

**`with_preprocess` (same protocol as training test split):**

```bash
python inference_uchen_ume.py \
  --benchmark-dir benchmark \
  --weights with_preprocess/final_model.pt \
  --preprocess none
```

From this repo:

```bash
python experiments/uchen_ume_binary/eval_benchmark.py \
  --checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark

python experiments/uchen_ume_binary/eval_benchmark.py \
  --checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
# default test-preprocess is none — do NOT pass center_crop for fair comparison
```

## Parquet dataset

[openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)

```python
from datasets import load_dataset
bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark")
```

## Load weights

```python
from huggingface_hub import hf_hub_download
import torch
path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model")
ckpt = torch.load(path, map_location="cpu", weights_only=False)
```