File size: 6,272 Bytes

9aa85b7
444f57e
47b73b3
477d98b
9aa85b7
47b73b3
 
 
 
 
 
 
7d64bbf
 
444f57e
 
47b73b3
444f57e
47b73b3
 
444f57e
47b73b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9aa85b7
 
444f57e
9aa85b7
444f57e
9aa85b7
47b73b3
9aa85b7
35d5a7b
477d98b
35d5a7b
477d98b
35d5a7b
 
 
 
477d98b
35d5a7b
477d98b
35d5a7b
b9771ca
47b73b3
8ada150
35d5a7b
 
 
 
8ada150
eab89d0
 
 
 
 
 
 
 
 
444f57e
b9771ca
444f57e
 
 
 
8ada150
b9771ca
35d5a7b
477d98b
444f57e
b9771ca
35d5a7b
444f57e
35d5a7b
 
b9771ca
444f57e
477d98b
35d5a7b
477d98b
444f57e
 
 
427a38d
444f57e
 
35d5a7b
 
444f57e
 
 
427a38d
35d5a7b
 
 
 
 
 
 
 
9aa85b7
444f57e
35d5a7b
 
 
 
 
7d64bbf
 
8ada150
35d5a7b
8ada150
 
 
 
 
 
 
a1548ee
35d5a7b
be04bdb
a1548ee
 
 
47b73b3
8ada150
be04bdb
8ada150
a1548ee
 
444f57e
 
 
 
35d5a7b
 
 
444f57e
 
 
 
 
47b73b3
 
 
 
 
 
444f57e
 
 
 
 
47b73b3

---
language:
- bo
license: apache-2.0
tags:
- image-classification
- tibetan
- uchen
- ume
- script-classification
- dinov3
- fine-tuned
library_name: transformers
pipeline_tag: image-classification
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
datasets:
- openpecha/uchen_ume_classification_dataset
metrics:
- f1
- accuracy
model-index:
- name: Uchen-Ume Classifier (DINOv3 ViT-S) — center crop
  results:
  - task:
      type: image-classification
      name: Tibetan Script Classification (center-crop whole page)
    dataset:
      name: openpecha/uchen-ume-classification-benchmark
      type: openpecha/uchen-ume-classification-benchmark
      split: test
    metrics:
    - name: Macro F1 (center crop)
      type: f1
      value: 0.983
    - name: Accuracy (center crop)
      type: accuracy
      value: 0.993
- name: Uchen-Ume Classifier (DINOv3 ViT-S) — full page
  results:
  - task:
      type: image-classification
      name: Tibetan Script Classification (full page)
    dataset:
      name: openpecha/uchen-ume-classification-benchmark
      type: openpecha/uchen-ume-classification-benchmark
      split: test
    metrics:
    - name: Macro F1 (full page)
      type: f1
      value: 0.708
    - name: Accuracy (full page)
      type: accuracy
      value: 0.807
---

# Uchen vs Umê Classifier (DINOv3 ViT-S)

Binary Tibetan script classifier: **Uchen** (དབུ་ཅན།, headed/printed script) vs **Umê** (དབུ་མེད།, headless/cursive script). Fine-tuned from [DINOv3 ViT-S](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) on ~10,000 manuscript scans from the [Buddhist Digital Resource Center](https://www.bdrc.io) (BDRC).

**Dataset:** [openpecha/uchen-ume-classification-dataset](https://huggingface.co/datasets/openpecha/uchen-ume-classification-dataset)

## Which checkpoint to use

Pick the variant that matches **how you preprocess at inference**:

| Your pipeline | Weights | Inference preprocess |
|---------------|---------|----------------------|
| **Center-crop whole page** (resize short edge → 224, center crop) | **`center_crop_all/final_model.pt`** | `--preprocess center_crop_whole_page` |
| **Raw full manuscript page** (no PIL crop before DINO) | **`without_preprocess/final_model.pt`** | `--preprocess none` |

**Do not** use `with_preprocess/` — it was trained with center crop on train/val but evaluated on full-page test (56% acc). That train/test mismatch is why val looked ~99% while test JSON was ~56%.

## Best results

Hub split: 9,110 train / 1,000 val / 851 test (work-stratified).

| Variant | Train | Val | Test @ eval | Test acc | Test macro-F1 | Val macro-F1 (best) |
|---------|-------|-----|-------------|:--------:|:-------------:|:-------------------:|
| **`center_crop_all/`** | center crop | center crop | **center crop** | **99.3%** | **0.983** | **0.996** |
| **`without_preprocess/`** | none | none | none (full page) | **80.7%** | **0.708** | 0.771 |

### Test confusion matrices (851 pages)

| Variant | uchen→uchen | uchen→ume | ume→uchen | ume→ume |
|---------|------------:|----------:|----------:|--------:|
| **`center_crop_all/`** | 94 | 3 | 3 | 751 |
| **`without_preprocess/`** | 97 | 2 | 165 | 603 |

See `confusion_matrix.json` and `confusion_matrix.png` in each variant folder on the Hub.

## Training data

| Class | Train | Validation | Test | Total |
|-------|------:|-----------:|-----:|------:|
| Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
| Ume | ~5,986 | ~660 | ~561 | ~7,207 |
| **Total pages** | **9,110** | **1,000** | **851** | **10,961** |

Splits are partitioned at the **work level** — all pages from the same manuscript stay in one split only.

## Architecture

- **Backbone:** DINOv3 ViT-S/16 (21M params)
- **Head:** LayerNorm → Dropout(0.1) → Linear(384, 128) → GELU → Dropout(0.1) → Linear(128, 2)
- **Stages:** A (head) → B (last 2 blocks) → C (last 4 blocks)
- **Balancing:** WeightedRandomSampler + class-weighted cross-entropy

## Quick start

### Center-crop pipeline (recommended if you crop pages)

```python
from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    "openpecha/uchen-ume-classifier",
    "center_crop_all/final_model.pt",
    repo_type="model",
)
ckpt = torch.load(path, map_location="cpu", weights_only=False)
```

```bash
python inference_uchen_ume.py \
  --image page.jpg \
  --weights center_crop_all/final_model.pt \
  --preprocess center_crop_whole_page
```

### Full-page pipeline

```python
path = hf_hub_download(
    "openpecha/uchen-ume-classifier",
    "without_preprocess/final_model.pt",
    repo_type="model",
)
```

```bash
python inference_uchen_ume.py \
  --weights without_preprocess/final_model.pt \
  --preprocess none
```

## Repo layout

```
center_crop_all/             ← center_crop_whole_page at inference (~99% test)
  final_model.pt
  model_card.json
  results.json               ← includes confusion_matrix
  confusion_matrix.json
  confusion_matrix.png
without_preprocess/          ← full pages (~81% test)
  final_model.pt
  model_card.json
  results.json
  confusion_matrix.json
  confusion_matrix.png
```

## Limitations

- **Preprocess must match training.** Center-crop model on full pages ≈ 56%; full-page model expects uncropped input.
- Trained on BDRC digitised manuscripts; may underperform on photos or non-BDRC scans.
- **Access requirement:** DINOv3 is gated — accept [facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) and run `huggingface-cli login`.

## Citation

```bibtex
@misc{karma2026uchenume,
    title        = {Uchen-Ume Classifier: Binary Tibetan Script Classification with DINOv3},
    author       = {Karma Tashi and Elie Roux},
    year         = {2026},
    publisher    = {HuggingFace},
    url          = {https://huggingface.co/openpecha/uchen-ume-classifier},
    note         = {Funded by Khyentse Foundation. Images sourced from the Buddhist Digital Resource Center (BDRC).}
}
```

## Acknowledgements

Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.