karma689 commited on
Commit
444f57e
·
verified ·
1 Parent(s): b9771ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -59
README.md CHANGED
@@ -1,97 +1,184 @@
1
  ---
 
 
2
  license: apache-2.0
3
  tags:
4
- - image-classification
5
- - tibetan
6
- - uchen
7
- - ume
 
 
 
8
  library_name: transformers
9
  pipeline_tag: image-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- # Uchen vs Umê classifier (DINOv3 ViT-S)
13
 
14
- Binary Tibetan script classifier: **uchen** (printed) vs **ume** (cursive).
15
 
16
- **Dataset (splits, Parquet, inference):** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
17
 
18
- ## Training preprocess (from `config.yaml` + `train.py`)
19
 
20
- `train.py` builds **three** dataloaders with **per-split** preprocess (`preprocess_for_split` in `common.py`):
21
 
22
- | Split | `with_preprocess` config | Effect in `ScriptImageDataset.__getitem__` |
23
- |-------|--------------------------|------------------------------------------|
24
- | **train** | `train_preprocess: center_crop_whole_page` | Center crop before augment + DINO processor |
25
- | **val** | `val_preprocess: center_crop_whole_page` | Center crop before DINO processor |
26
- | **test** | `test_preprocess: none` | **Full page** — no crop, only DINO processor |
27
 
28
- So high **validation** scores for `with_preprocess` (val F1 ~0.99) are on **cropped** pages. **Test** during training uses **full pages** (test F1 ~0.51). That is intentional in the code, not a bug.
29
 
30
- **Benchmark eval must use `test_preprocess: none`** (same as the test split) unless you are deliberately measuring crop-to-crop generalization.
 
 
 
31
 
32
- ## Recommended weights for full manuscript pages
33
 
34
- **`without_preprocess/final_model.pt`** trained without runtime crop on any split.
35
 
36
- ## Results summary
 
 
 
 
37
 
38
- **Benchmark** = 60 held-out images (30 uchen + 30 ume). **Test** = 867 images (work-stratified), full pages.
39
 
40
- | Variant | Train/val preprocess | Test & benchmark eval preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 | Benchmark AUC |
41
- |---------|---------------------|----------------------------------|----------|---------------|---------------|-------------------|---------------|
42
- | **`without_preprocess/`** | none | **none** (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** | 0.970 |
43
- | **`with_preprocess/`** | center crop | **none** (full page) | 56.1% | 0.506 | **68.3%** | **0.648** | 0.953 |
44
- | ~~with_preprocess~~ | center crop | ~~center crop at inference~~ *(not comparable to test)* | — | — | ~~98.3%~~ | ~~0.983~~ | — |
45
 
46
- The ~~98.3%~~ benchmark number only appears if you **center-crop at inference**, which matches **val** but **not** how the model was evaluated on **test** during training.
47
 
48
- ## Benchmark evaluation (60 images)
49
 
50
- ### Fair eval — full pages (`preprocess none`, matches `test_preprocess`)
51
 
52
- **`without_preprocess` (recommended):**
 
 
 
 
53
 
54
- ```bash
55
- python inference_uchen_ume.py \
56
- --benchmark-dir benchmark \
57
- --weights without_preprocess/final_model.pt \
58
- --preprocess none
59
- ```
60
 
61
- **`with_preprocess` (same protocol as training test split):**
62
 
63
- ```bash
64
- python inference_uchen_ume.py \
65
- --benchmark-dir benchmark \
66
- --weights with_preprocess/final_model.pt \
67
- --preprocess none
68
- ```
69
 
70
- From this repo:
 
 
 
 
 
 
71
 
72
- ```bash
73
- python experiments/uchen_ume_binary/eval_benchmark.py \
74
- --checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
75
 
76
- python experiments/uchen_ume_binary/eval_benchmark.py \
77
- --checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
78
- # default test-preprocess is none — do NOT pass center_crop for fair comparison
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ```
80
 
81
- ## Parquet dataset
82
-
83
- [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
84
 
85
  ```python
86
  from datasets import load_dataset
87
- bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark")
 
 
 
 
88
  ```
89
 
90
- ## Load weights
 
 
91
 
92
- ```python
93
- from huggingface_hub import hf_hub_download
94
- import torch
95
- path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model")
96
- ckpt = torch.load(path, map_location="cpu", weights_only=False)
97
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - bo
4
  license: apache-2.0
5
  tags:
6
+ - image-classification
7
+ - tibetan
8
+ - uchen
9
+ - ume
10
+ - script-classification
11
+ - dinov3
12
+ - fine-tuned
13
  library_name: transformers
14
  pipeline_tag: image-classification
15
+ base_model: facebook/dinov3-vits16-pretrain-lvd1689m
16
+ datasets:
17
+ - openpecha/uchen-ume-classification-benchmark
18
+ metrics:
19
+ - f1
20
+ - accuracy
21
+ model-index:
22
+ - name: Uchen-Ume Classifier (DINOv3 ViT-S)
23
+ results:
24
+ - task:
25
+ type: image-classification
26
+ name: Tibetan Script Classification (Uchen vs Ume)
27
+ dataset:
28
+ name: openpecha/uchen-ume-classification-benchmark
29
+ type: openpecha/uchen-ume-classification-benchmark
30
+ split: test
31
+ metrics:
32
+ - name: Macro F1 (full page)
33
+ type: f1
34
+ value: 0.708
35
+ - name: Accuracy (full page)
36
+ type: accuracy
37
+ value: 0.807
38
  ---
39
 
40
+ # Uchen vs Umê Classifier (DINOv3 ViT-S)
41
 
42
+ Binary Tibetan script classifier: **Uchen** (དབུ་ཅན།, headed/printed script) vs **Umê** (དབུ་མེད།, headless/cursive script). Fine-tuned from [DINOv3 ViT-S](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) on ~10,000 manuscript scans from the [Buddhist Digital Resource Center](https://www.bdrc.io) (BDRC).
43
 
44
+ **Dataset:** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
45
 
46
+ ## Recommended checkpoint
47
 
48
+ **Use `without_preprocess/final_model.pt`** for production. This model was trained and evaluated on full manuscript pages with no preprocessing what you get is what you deploy.
49
 
50
+ ## Results
 
 
 
 
51
 
52
+ Test set = 867 images, work-stratified split, no overlap with training works.
53
 
54
+ | Variant | Train/val preprocess | Test preprocess | Test acc | Test macro-F1 |
55
+ |---------|---------------------|-----------------|:--------:|:-------------:|
56
+ | **`without_preprocess/`** (recommended) | none | none (full page) | **80.7%** | **0.708** |
57
+ | `with_preprocess/` | center crop | none (full page) | 56.1% | 0.506 |
58
 
59
+ The `without_preprocess` variant is trained and tested on full pages — no mismatch between training and inference. The `with_preprocess` variant achieves ~99% validation F1 on center-cropped images (matching its training distribution), but drops to 56% when tested on full pages because the model has never seen uncropped input. This train–test mismatch makes it unsuitable for production where raw manuscript images are the input.
60
 
61
+ ## Training data
62
 
63
+ | Class | Train | Validation | Test | Total |
64
+ |-------|------:|-----------:|-----:|------:|
65
+ | Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
66
+ | Ume | ~5,986 | ~660 | ~561 | ~7,207 |
67
+ | **Total** | **9,110** | **1,000** | **851** | **10,961** |
68
 
69
+ **Uchen** includes: `uchen_sugthung`, `uchen_sugdring`, `uchen_sugring` (distinguished by descender length).
70
 
71
+ **Ume** includes: `petsuk`, `peri`, `tsegdrig`, `drudring`, `druring`, `druthung`, `drathung`, `khyuyig`, `tsumachug`, `yigchung`, `tsugchung`, `trinyig`, `dhumri`.
 
 
 
 
72
 
73
+ **Excluded:** `difficult`, `multi_scripts`, `non_tibetan`.
74
 
75
+ Splits are partitioned at the **work level** — all pages from the same manuscript (`W` prefix in the filename) stay in one split only.
76
 
77
+ ## Architecture
78
 
79
+ - **Backbone:** DINOv3 ViT-S/16 (21M params, self-supervised pretraining on 1.7B images)
80
+ - **Head:** LayerNorm → Dropout(0.1) → Linear(384, 128) → GELU → Dropout(0.1) → Linear(128, 2)
81
+ - **Training:** Head only (backbone frozen), 20 epochs, lr=1e-3, AdamW with cosine schedule
82
+ - **Balancing:** WeightedRandomSampler + class-weighted cross-entropy loss
83
+ - **Augmentations:** Random rotation ±5°, brightness/contrast jitter ±20%, random crop scale 0.7–1.0, random erasing. No horizontal flip.
84
 
85
+ ## Quick start
 
 
 
 
 
86
 
87
+ ### Load weights
88
 
89
+ ```python
90
+ from huggingface_hub import hf_hub_download
91
+ import torch
 
 
 
92
 
93
+ path = hf_hub_download(
94
+ "openpecha/uchen-ume-classifier",
95
+ "without_preprocess/final_model.pt",
96
+ repo_type="model"
97
+ )
98
+ ckpt = torch.load(path, map_location="cpu", weights_only=False)
99
+ ```
100
 
101
+ ### Classify an image
 
 
102
 
103
+ ```python
104
+ import torch
105
+ import torch.nn as nn
106
+ from PIL import Image
107
+ from transformers import AutoImageProcessor, AutoModel
108
+
109
+ class UchenUmeClassifier(nn.Module):
110
+ def __init__(self, model_id):
111
+ super().__init__()
112
+ self.backbone = AutoModel.from_pretrained(model_id)
113
+ h = self.backbone.config.hidden_size
114
+ self.head = nn.Sequential(
115
+ nn.LayerNorm(h), nn.Dropout(0.1),
116
+ nn.Linear(h, 128), nn.GELU(), nn.Dropout(0.1),
117
+ nn.Linear(128, 2),
118
+ )
119
+ def forward(self, pixel_values):
120
+ out = self.backbone(pixel_values=pixel_values)
121
+ return self.head(out.last_hidden_state[:, 0, :])
122
+
123
+ MODEL_ID = "facebook/dinov3-vits16-pretrain-lvd1689m"
124
+ model = UchenUmeClassifier(MODEL_ID)
125
+ model.load_state_dict(ckpt["model_state_dict"])
126
+ model.eval()
127
+
128
+ processor = AutoImageProcessor.from_pretrained(MODEL_ID)
129
+ img = Image.open("manuscript.jpg").convert("RGB")
130
+ inputs = processor(images=img, return_tensors="pt")
131
+
132
+ with torch.no_grad():
133
+ probs = torch.softmax(model(inputs["pixel_values"]), dim=1)[0]
134
+
135
+ label = "uchen" if probs[0] > probs[1] else "ume"
136
+ print(f"{label} ({probs.max():.1%})")
137
  ```
138
 
139
+ ### Load the dataset
 
 
140
 
141
  ```python
142
  from datasets import load_dataset
143
+
144
+ ds = load_dataset("openpecha/uchen-ume-classification-benchmark")
145
+ train = ds["train"] # 9,110 images
146
+ val = ds["validation"] # 1,000 images
147
+ test = ds["test"] # 851 images
148
  ```
149
 
150
+ ## Intended use
151
+
152
+ This model is **Level 1** of a hierarchical Tibetan script classification pipeline:
153
 
 
 
 
 
 
154
  ```
155
+ Manuscript image
156
+ → Level 1: Uchen vs Ume (this model)
157
+ ├── Uchen → Level 2: sugthung / sugdring / sugring
158
+ └── Ume → Level 2: druma / danyig / pedri / tsugdri / gyuyig
159
+ ```
160
+
161
+ ## Limitations
162
+
163
+ - Trained on BDRC digitised manuscripts. May underperform on photographs, modern prints, or non-BDRC scans.
164
+ - The DINOv3 processor squashes the 5:1 pecha aspect ratio to 224×224. The `without_preprocess` model is trained to handle this, but extreme aspect ratios may still degrade performance.
165
+ - Edge cases (partial head strokes, transitional styles, heavy damage) may produce low-confidence predictions.
166
+ - **Access requirement:** DINOv3 is gated. Request access at [facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) and run `huggingface-cli login` before use.
167
+
168
+ ## Citation
169
+
170
+ ```bibtex
171
+ @misc{karma2026uchenume,
172
+ title = {Uchen-Ume Classifier: Binary Tibetan Script Classification with DINOv3},
173
+ author = {Karma Tashi and Elie Roux},
174
+ year = {2026},
175
+ url = {https://huggingface.co/openpecha/uchen-ume-classifier},
176
+ note = {Fine-tuned on openpecha/uchen-ume-classification-benchmark.
177
+ Funded by Khyentse Foundation.
178
+ Images from the Buddhist Digital Resource Center (BDRC).}
179
+ }
180
+ ```
181
+
182
+ ## Acknowledgements
183
+
184
+ Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.