Image Classification
Transformers
Tibetan
tibetan
uchen
ume
script-classification
dinov3
fine-tuned
Eval Results (legacy)
Instructions to use openpecha/uchen-ume-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openpecha/uchen-ume-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="openpecha/uchen-ume-classifier") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openpecha/uchen-ume-classifier", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 6,272 Bytes
9aa85b7 444f57e 47b73b3 477d98b 9aa85b7 47b73b3 7d64bbf 444f57e 47b73b3 444f57e 47b73b3 444f57e 47b73b3 9aa85b7 444f57e 9aa85b7 444f57e 9aa85b7 47b73b3 9aa85b7 35d5a7b 477d98b 35d5a7b 477d98b 35d5a7b 477d98b 35d5a7b 477d98b 35d5a7b b9771ca 47b73b3 8ada150 35d5a7b 8ada150 eab89d0 444f57e b9771ca 444f57e 8ada150 b9771ca 35d5a7b 477d98b 444f57e b9771ca 35d5a7b 444f57e 35d5a7b b9771ca 444f57e 477d98b 35d5a7b 477d98b 444f57e 427a38d 444f57e 35d5a7b 444f57e 427a38d 35d5a7b 9aa85b7 444f57e 35d5a7b 7d64bbf 8ada150 35d5a7b 8ada150 a1548ee 35d5a7b be04bdb a1548ee 47b73b3 8ada150 be04bdb 8ada150 a1548ee 444f57e 35d5a7b 444f57e 47b73b3 444f57e 47b73b3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | ---
language:
- bo
license: apache-2.0
tags:
- image-classification
- tibetan
- uchen
- ume
- script-classification
- dinov3
- fine-tuned
library_name: transformers
pipeline_tag: image-classification
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
datasets:
- openpecha/uchen_ume_classification_dataset
metrics:
- f1
- accuracy
model-index:
- name: Uchen-Ume Classifier (DINOv3 ViT-S) — center crop
results:
- task:
type: image-classification
name: Tibetan Script Classification (center-crop whole page)
dataset:
name: openpecha/uchen-ume-classification-benchmark
type: openpecha/uchen-ume-classification-benchmark
split: test
metrics:
- name: Macro F1 (center crop)
type: f1
value: 0.983
- name: Accuracy (center crop)
type: accuracy
value: 0.993
- name: Uchen-Ume Classifier (DINOv3 ViT-S) — full page
results:
- task:
type: image-classification
name: Tibetan Script Classification (full page)
dataset:
name: openpecha/uchen-ume-classification-benchmark
type: openpecha/uchen-ume-classification-benchmark
split: test
metrics:
- name: Macro F1 (full page)
type: f1
value: 0.708
- name: Accuracy (full page)
type: accuracy
value: 0.807
---
# Uchen vs Umê Classifier (DINOv3 ViT-S)
Binary Tibetan script classifier: **Uchen** (དབུ་ཅན།, headed/printed script) vs **Umê** (དབུ་མེད།, headless/cursive script). Fine-tuned from [DINOv3 ViT-S](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) on ~10,000 manuscript scans from the [Buddhist Digital Resource Center](https://www.bdrc.io) (BDRC).
**Dataset:** [openpecha/uchen-ume-classification-dataset](https://huggingface.co/datasets/openpecha/uchen-ume-classification-dataset)
## Which checkpoint to use
Pick the variant that matches **how you preprocess at inference**:
| Your pipeline | Weights | Inference preprocess |
|---------------|---------|----------------------|
| **Center-crop whole page** (resize short edge → 224, center crop) | **`center_crop_all/final_model.pt`** | `--preprocess center_crop_whole_page` |
| **Raw full manuscript page** (no PIL crop before DINO) | **`without_preprocess/final_model.pt`** | `--preprocess none` |
**Do not** use `with_preprocess/` — it was trained with center crop on train/val but evaluated on full-page test (56% acc). That train/test mismatch is why val looked ~99% while test JSON was ~56%.
## Best results
Hub split: 9,110 train / 1,000 val / 851 test (work-stratified).
| Variant | Train | Val | Test @ eval | Test acc | Test macro-F1 | Val macro-F1 (best) |
|---------|-------|-----|-------------|:--------:|:-------------:|:-------------------:|
| **`center_crop_all/`** | center crop | center crop | **center crop** | **99.3%** | **0.983** | **0.996** |
| **`without_preprocess/`** | none | none | none (full page) | **80.7%** | **0.708** | 0.771 |
### Test confusion matrices (851 pages)
| Variant | uchen→uchen | uchen→ume | ume→uchen | ume→ume |
|---------|------------:|----------:|----------:|--------:|
| **`center_crop_all/`** | 94 | 3 | 3 | 751 |
| **`without_preprocess/`** | 97 | 2 | 165 | 603 |
See `confusion_matrix.json` and `confusion_matrix.png` in each variant folder on the Hub.
## Training data
| Class | Train | Validation | Test | Total |
|-------|------:|-----------:|-----:|------:|
| Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
| Ume | ~5,986 | ~660 | ~561 | ~7,207 |
| **Total pages** | **9,110** | **1,000** | **851** | **10,961** |
Splits are partitioned at the **work level** — all pages from the same manuscript stay in one split only.
## Architecture
- **Backbone:** DINOv3 ViT-S/16 (21M params)
- **Head:** LayerNorm → Dropout(0.1) → Linear(384, 128) → GELU → Dropout(0.1) → Linear(128, 2)
- **Stages:** A (head) → B (last 2 blocks) → C (last 4 blocks)
- **Balancing:** WeightedRandomSampler + class-weighted cross-entropy
## Quick start
### Center-crop pipeline (recommended if you crop pages)
```python
from huggingface_hub import hf_hub_download
import torch
path = hf_hub_download(
"openpecha/uchen-ume-classifier",
"center_crop_all/final_model.pt",
repo_type="model",
)
ckpt = torch.load(path, map_location="cpu", weights_only=False)
```
```bash
python inference_uchen_ume.py \
--image page.jpg \
--weights center_crop_all/final_model.pt \
--preprocess center_crop_whole_page
```
### Full-page pipeline
```python
path = hf_hub_download(
"openpecha/uchen-ume-classifier",
"without_preprocess/final_model.pt",
repo_type="model",
)
```
```bash
python inference_uchen_ume.py \
--weights without_preprocess/final_model.pt \
--preprocess none
```
## Repo layout
```
center_crop_all/ ← center_crop_whole_page at inference (~99% test)
final_model.pt
model_card.json
results.json ← includes confusion_matrix
confusion_matrix.json
confusion_matrix.png
without_preprocess/ ← full pages (~81% test)
final_model.pt
model_card.json
results.json
confusion_matrix.json
confusion_matrix.png
```
## Limitations
- **Preprocess must match training.** Center-crop model on full pages ≈ 56%; full-page model expects uncropped input.
- Trained on BDRC digitised manuscripts; may underperform on photos or non-BDRC scans.
- **Access requirement:** DINOv3 is gated — accept [facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) and run `huggingface-cli login`.
## Citation
```bibtex
@misc{karma2026uchenume,
title = {Uchen-Ume Classifier: Binary Tibetan Script Classification with DINOv3},
author = {Karma Tashi and Elie Roux},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/openpecha/uchen-ume-classifier},
note = {Funded by Khyentse Foundation. Images sourced from the Buddhist Digital Resource Center (BDRC).}
}
```
## Acknowledgements
Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**. |