openpecha
/

uchen-ume-classifier

@@ -1,97 +1,184 @@
 ---
 license: apache-2.0
 tags:
-  - image-classification
-  - tibetan
-  - uchen
-  - ume
 library_name: transformers
 pipeline_tag: image-classification
 ---
-# Uchen vs Umê classifier (DINOv3 ViT-S)
-Binary Tibetan script classifier: **uchen** (printed) vs **ume** (cursive).
-**Dataset (splits, Parquet, inference):** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
-## Training preprocess (from `config.yaml` + `train.py`)
-`train.py` builds **three** dataloaders with **per-split** preprocess (`preprocess_for_split` in `common.py`):
-| Split | `with_preprocess` config | Effect in `ScriptImageDataset.__getitem__` |
-|-------|--------------------------|------------------------------------------|
-| **train** | `train_preprocess: center_crop_whole_page` | Center crop before augment + DINO processor |
-| **val** | `val_preprocess: center_crop_whole_page` | Center crop before DINO processor |
-| **test** | `test_preprocess: none` | **Full page** — no crop, only DINO processor |
-So high **validation** scores for `with_preprocess` (val F1 ~0.99) are on **cropped** pages. **Test** during training uses **full pages** (test F1 ~0.51). That is intentional in the code, not a bug.
-**Benchmark eval must use `test_preprocess: none`** (same as the test split) unless you are deliberately measuring crop-to-crop generalization.
-## Recommended weights for full manuscript pages
-**`without_preprocess/final_model.pt`** — trained without runtime crop on any split.
-## Results summary
-**Benchmark** = 60 held-out images (30 uchen + 30 ume). **Test** = 867 images (work-stratified), full pages.
-| Variant | Train/val preprocess | Test & benchmark eval preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 | Benchmark AUC |
-|---------|---------------------|----------------------------------|----------|---------------|---------------|-------------------|---------------|
-| **`without_preprocess/`** | none | **none** (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** | 0.970 |
-| **`with_preprocess/`** | center crop | **none** (full page) | 56.1% | 0.506 | **68.3%** | **0.648** | 0.953 |
-| ~~with_preprocess~~ | center crop | ~~center crop at inference~~ *(not comparable to test)* | — | — | ~~98.3%~~ | ~~0.983~~ | — |
-The ~~98.3%~~ benchmark number only appears if you **center-crop at inference**, which matches **val** but **not** how the model was evaluated on **test** during training.
-## Benchmark evaluation (60 images)
-### Fair eval — full pages (`preprocess none`, matches `test_preprocess`)
-**`without_preprocess` (recommended):**
-```bash
-python inference_uchen_ume.py \
-  --benchmark-dir benchmark \
-  --weights without_preprocess/final_model.pt \
-  --preprocess none
-```
-**`with_preprocess` (same protocol as training test split):**
-```bash
-python inference_uchen_ume.py \
-  --benchmark-dir benchmark \
-  --weights with_preprocess/final_model.pt \
-  --preprocess none
-```
-From this repo:
-```bash
-python experiments/uchen_ume_binary/eval_benchmark.py \
-  --checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
-python experiments/uchen_ume_binary/eval_benchmark.py \
-  --checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark
-# default test-preprocess is none — do NOT pass center_crop for fair comparison
 ```
-## Parquet dataset
-[openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
 ```python
 from datasets import load_dataset
-bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark")
 ```
-## Load weights
-```python
-from huggingface_hub import hf_hub_download
-import torch
-path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model")
-ckpt = torch.load(path, map_location="cpu", weights_only=False)
 ```

 ---
+language:
+- bo
 license: apache-2.0
 tags:
+- image-classification
+- tibetan
+- uchen
+- ume
+- script-classification
+- dinov3
+- fine-tuned
 library_name: transformers
 pipeline_tag: image-classification
+base_model: facebook/dinov3-vits16-pretrain-lvd1689m
+datasets:
+- openpecha/uchen-ume-classification-benchmark
+metrics:
+- f1
+- accuracy
+model-index:
+- name: Uchen-Ume Classifier (DINOv3 ViT-S)
+  results:
+  - task:
+      type: image-classification
+      name: Tibetan Script Classification (Uchen vs Ume)
+    dataset:
+      name: openpecha/uchen-ume-classification-benchmark
+      type: openpecha/uchen-ume-classification-benchmark
+      split: test
+    metrics:
+    - name: Macro F1 (full page)
+      type: f1
+      value: 0.708
+    - name: Accuracy (full page)
+      type: accuracy
+      value: 0.807
 ---
+# Uchen vs Umê Classifier (DINOv3 ViT-S)
+Binary Tibetan script classifier: **Uchen** (དབུ་ཅན།, headed/printed script) vs **Umê** (དབུ་མེད།, headless/cursive script). Fine-tuned from [DINOv3 ViT-S](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) on ~10,000 manuscript scans from the [Buddhist Digital Resource Center](https://www.bdrc.io) (BDRC).
+**Dataset:** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark)
+## Recommended checkpoint
+**Use `without_preprocess/final_model.pt`** for production. This model was trained and evaluated on full manuscript pages with no preprocessing — what you get is what you deploy.
+## Results
+Test set = 867 images, work-stratified split, no overlap with training works.
+| Variant | Train/val preprocess | Test preprocess | Test acc | Test macro-F1 |
+|---------|---------------------|-----------------|:--------:|:-------------:|
+| **`without_preprocess/`** (recommended) | none | none (full page) | **80.7%** | **0.708** |
+| `with_preprocess/` | center crop | none (full page) | 56.1% | 0.506 |
+The `without_preprocess` variant is trained and tested on full pages — no mismatch between training and inference. The `with_preprocess` variant achieves ~99% validation F1 on center-cropped images (matching its training distribution), but drops to 56% when tested on full pages because the model has never seen uncropped input. This train–test mismatch makes it unsuitable for production where raw manuscript images are the input.
+## Training data
+| Class | Train | Validation | Test | Total |
+|-------|------:|-----------:|-----:|------:|
+| Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
+| Ume | ~5,986 | ~660 | ~561 | ~7,207 |
+| **Total** | **9,110** | **1,000** | **851** | **10,961** |
+**Uchen** includes: `uchen_sugthung`, `uchen_sugdring`, `uchen_sugring` (distinguished by descender length).
+**Ume** includes: `petsuk`, `peri`, `tsegdrig`, `drudring`, `druring`, `druthung`, `drathung`, `khyuyig`, `tsumachug`, `yigchung`, `tsugchung`, `trinyig`, `dhumri`.
+**Excluded:** `difficult`, `multi_scripts`, `non_tibetan`.
+Splits are partitioned at the **work level** — all pages from the same manuscript (`W` prefix in the filename) stay in one split only.
+## Architecture
+- **Backbone:** DINOv3 ViT-S/16 (21M params, self-supervised pretraining on 1.7B images)
+- **Head:** LayerNorm → Dropout(0.1) → Linear(384, 128) → GELU → Dropout(0.1) → Linear(128, 2)
+- **Training:** Head only (backbone frozen), 20 epochs, lr=1e-3, AdamW with cosine schedule
+- **Balancing:** WeightedRandomSampler + class-weighted cross-entropy loss
+- **Augmentations:** Random rotation ±5°, brightness/contrast jitter ±20%, random crop scale 0.7–1.0, random erasing. No horizontal flip.
+## Quick start
+### Load weights
+```python
+from huggingface_hub import hf_hub_download
+import torch
+path = hf_hub_download(
+    "openpecha/uchen-ume-classifier",
+    "without_preprocess/final_model.pt",
+    repo_type="model"
+)
+ckpt = torch.load(path, map_location="cpu", weights_only=False)
+```
+### Classify an image
+```python
+import torch
+import torch.nn as nn
+from PIL import Image
+from transformers import AutoImageProcessor, AutoModel
+class UchenUmeClassifier(nn.Module):
+    def __init__(self, model_id):
+        super().__init__()
+        self.backbone = AutoModel.from_pretrained(model_id)
+        h = self.backbone.config.hidden_size
+        self.head = nn.Sequential(
+            nn.LayerNorm(h), nn.Dropout(0.1),
+            nn.Linear(h, 128), nn.GELU(), nn.Dropout(0.1),
+            nn.Linear(128, 2),
+        )
+    def forward(self, pixel_values):
+        out = self.backbone(pixel_values=pixel_values)
+        return self.head(out.last_hidden_state[:, 0, :])
+MODEL_ID = "facebook/dinov3-vits16-pretrain-lvd1689m"
+model = UchenUmeClassifier(MODEL_ID)
+model.load_state_dict(ckpt["model_state_dict"])
+model.eval()
+processor = AutoImageProcessor.from_pretrained(MODEL_ID)
+img = Image.open("manuscript.jpg").convert("RGB")
+inputs = processor(images=img, return_tensors="pt")
+with torch.no_grad():
+    probs = torch.softmax(model(inputs["pixel_values"]), dim=1)[0]
+label = "uchen" if probs[0] > probs[1] else "ume"
+print(f"{label} ({probs.max():.1%})")
 ```
+### Load the dataset
 ```python
 from datasets import load_dataset
+ds = load_dataset("openpecha/uchen-ume-classification-benchmark")
+train = ds["train"]       # 9,110 images
+val   = ds["validation"]  # 1,000 images
+test  = ds["test"]        #   851 images
 ```
+## Intended use
+This model is **Level 1** of a hierarchical Tibetan script classification pipeline:
 ```
+Manuscript image
+  → Level 1: Uchen vs Ume (this model)
+      ├── Uchen → Level 2: sugthung / sugdring / sugring
+      └── Ume   → Level 2: druma / danyig / pedri / tsugdri / gyuyig
+```
+## Limitations
+- Trained on BDRC digitised manuscripts. May underperform on photographs, modern prints, or non-BDRC scans.
+- The DINOv3 processor squashes the 5:1 pecha aspect ratio to 224×224. The `without_preprocess` model is trained to handle this, but extreme aspect ratios may still degrade performance.
+- Edge cases (partial head strokes, transitional styles, heavy damage) may produce low-confidence predictions.
+- **Access requirement:** DINOv3 is gated. Request access at [facebook/dinov3-vits16-pretrain-lvd1689m](https://huggingface.co/facebook/dinov3-vits16-pretrain-lvd1689m) and run `huggingface-cli login` before use.
+## Citation
+```bibtex
+@misc{karma2026uchenume,
+    title   = {Uchen-Ume Classifier: Binary Tibetan Script Classification with DINOv3},
+    author  = {Karma Tashi and Elie Roux},
+    year    = {2026},
+    url     = {https://huggingface.co/openpecha/uchen-ume-classifier},
+    note    = {Fine-tuned on openpecha/uchen-ume-classification-benchmark.
+               Funded by Khyentse Foundation.
+               Images from the Buddhist Digital Resource Center (BDRC).}
+}
+```
+## Acknowledgements
+Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.