openpecha
/

uchen-ume-classifier

@@ -1,40 +1,54 @@
 ---
 language:
-- bo
 license: apache-2.0
 tags:
-- image-classification
-- tibetan
-- uchen
-- ume
-- script-classification
-- dinov3
-- fine-tuned
 library_name: transformers
 pipeline_tag: image-classification
 base_model: facebook/dinov3-vits16-pretrain-lvd1689m
 datasets:
-- openpecha/uchen-ume-classification-benchmark
 metrics:
-- f1
-- accuracy
 model-index:
-- name: Uchen-Ume Classifier (DINOv3 ViT-S)
-  results:
-  - task:
-      type: image-classification
-      name: Tibetan Script Classification (Uchen vs Ume)
-    dataset:
-      name: openpecha/uchen-ume-classification-benchmark
-      type: openpecha/uchen-ume-classification-benchmark
-      split: test
-    metrics:
-    - name: Macro F1 (full page)
-      type: f1
-      value: 0.708
-    - name: Accuracy (full page)
-      type: accuracy
-      value: 0.807
 ---
 # Uchen vs Umê Classifier (DINOv3 ViT-S)
@@ -47,16 +61,25 @@ Binary Tibetan script classifier: **Uchen** (དབུ་ཅན།, headed/print
 **Use `without_preprocess/final_model.pt`** for production. This model was trained and evaluated on full manuscript pages with no preprocessing — what you get is what you deploy.
-## Results
-Test set = 867 images, work-stratified split, no overlap with training works.
-| Variant | Train/val preprocess | Test preprocess | Test acc | Test macro-F1 |
-|---------|---------------------|-----------------|:--------:|:-------------:|
-| **`without_preprocess/`** (recommended) | none | none (full page) | **80.7%** | **0.708** |
-| `with_preprocess/` | center crop | none (full page) | 56.1% | 0.506 |
-The `without_preprocess` variant is trained and tested on full pages — no mismatch between training and inference. The `with_preprocess` variant achieves ~99% validation F1 on center-cropped images (matching its training distribution), but drops to 56% when tested on full pages because the model has never seen uncropped input. This train–test mismatch makes it unsuitable for production where raw manuscript images are the input.
 ## Training data
@@ -64,7 +87,7 @@ The `without_preprocess` variant is trained and tested on full pages — no mism
 |-------|------:|-----------:|-----:|------:|
 | Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
 | Ume | ~5,986 | ~660 | ~561 | ~7,207 |
-| **Total** | **9,110** | **1,000** | **851** | **10,961** |
 **Uchen** includes: `uchen_sugthung`, `uchen_sugdring`, `uchen_sugring` (distinguished by descender length).
@@ -72,6 +95,8 @@ The `without_preprocess` variant is trained and tested on full pages — no mism
 **Excluded:** `difficult`, `multi_scripts`, `non_tibetan`.
 Splits are partitioned at the **work level** — all pages from the same manuscript (`W` prefix in the filename) stay in one split only.
 ## Architecture
@@ -136,6 +161,17 @@ label = "uchen" if probs[0] > probs[1] else "ume"
 print(f"{label} ({probs.max():.1%})")
 ```
 ### Load the dataset
 ```python
@@ -145,6 +181,20 @@ ds = load_dataset("openpecha/uchen-ume-classification-benchmark")
 train = ds["train"]       # 9,110 images
 val   = ds["validation"]  # 1,000 images
 test  = ds["test"]        #   851 images
 ```
 ## Intended use
@@ -181,4 +231,4 @@ Manuscript image
 ## Acknowledgements
-Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.

 ---
 language:
+  - bo
 license: apache-2.0
 tags:
+  - image-classification
+  - tibetan
+  - uchen
+  - ume
+  - script-classification
+  - dinov3
+  - fine-tuned
 library_name: transformers
 pipeline_tag: image-classification
 base_model: facebook/dinov3-vits16-pretrain-lvd1689m
 datasets:
+  - openpecha/uchen-ume-classification-benchmark
 metrics:
+  - f1
+  - accuracy
 model-index:
+  - name: Uchen-Ume Classifier (DINOv3 ViT-S)
+    results:
+      - task:
+          type: image-classification
+          name: Tibetan Script Classification (Uchen vs Ume)
+        dataset:
+          name: openpecha/uchen-ume-classification-benchmark
+          type: openpecha/uchen-ume-classification-benchmark
+          split: test
+        metrics:
+          - name: Macro F1 (full page)
+            type: f1
+            value: 0.708
+          - name: Accuracy (full page)
+            type: accuracy
+            value: 0.807
+      - task:
+          type: image-classification
+          name: Held-out benchmark (60 pages, full page)
+        dataset:
+          name: openpecha/uchen-ume-classification-benchmark
+          type: openpecha/uchen-ume-classification-benchmark
+          split: benchmark
+        metrics:
+          - name: Macro F1 (full page)
+            type: f1
+            value: 0.848
+          - name: Accuracy (full page)
+            type: accuracy
+            value: 0.850
 ---
 # Uchen vs Umê Classifier (DINOv3 ViT-S)
 **Use `without_preprocess/final_model.pt`** for production. This model was trained and evaluated on full manuscript pages with no preprocessing — what you get is what you deploy.
+## Best results (full pages)
+Test set = 867 images, work-stratified split, no overlap with training works. Benchmark = 60 held-out pages (30 uchen / 30 ume), disjoint from train/val/test.
+| Eval | Split | Images | Accuracy | Macro-F1 | AUC |
+|------|-------|-------:|---------:|---------:|----:|
+| **`without_preprocess/`** (recommended) | Test | 867 | **80.7%** | **0.708** | 0.970 |
+| **`without_preprocess/`** (recommended) | Benchmark | 60 | **85.0%** | **0.848** | 0.970 |
+| `with_preprocess/` | Test | 867 | 56.1% | 0.506 | 0.969 |
+| `with_preprocess/` | Benchmark | 60 | 68.3% | 0.648 | 0.953 |
+### Variant comparison
+| Variant | Train/val preprocess | Test & benchmark preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 |
+|---------|---------------------|-----------------------------|:--------:|:-------------:|:-------------:|:------------------:|
+| **`without_preprocess/`** | none | none (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** |
+| `with_preprocess/` | center crop | none (full page) | 56.1% | 0.506 | 68.3% | 0.648 |
+The `without_preprocess` variant is trained and tested on full pages — no mismatch between training and inference. The `with_preprocess` variant achieves ~99% validation F1 on center-cropped images (matching its training distribution), but drops to 56% when tested on full pages because the model has never seen uncropped input. Do **not** report ~99% test scores from runs that center-crop test at eval time.
 ## Training data
 |-------|------:|-----------:|-----:|------:|
 | Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
 | Ume | ~5,986 | ~660 | ~561 | ~7,207 |
+| **Total pages** | **9,110** | **1,000** | **851** | **10,961** |
 **Uchen** includes: `uchen_sugthung`, `uchen_sugdring`, `uchen_sugring` (distinguished by descender length).
 **Excluded:** `difficult`, `multi_scripts`, `non_tibetan`.
+Benchmark pages (60) are excluded from train/val/test via the published split manifest.
 Splits are partitioned at the **work level** — all pages from the same manuscript (`W` prefix in the filename) stay in one split only.
 ## Architecture
 print(f"{label} ({probs.max():.1%})")
 ```
+### Benchmark inference (full pages)
+```bash
+pip install -r https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark/raw/main/requirements-inference.txt
+python https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark/raw/main/inference_uchen_ume.py \
+  --benchmark-json benchmark/benchmark_holdout.json \
+  --fetch-urls \
+  --weights without_preprocess/final_model.pt \
+  --preprocess none
+```
 ### Load the dataset
 ```python
 train = ds["train"]       # 9,110 images
 val   = ds["validation"]  # 1,000 images
 test  = ds["test"]        #   851 images
+bench = ds["benchmark"]   #    60 images
+```
+## Repo layout
+```
+without_preprocess/   ← recommended (full-page test & benchmark)
+  final_model.pt
+  results.json
+  benchmark_eval_results.json
+with_preprocess/      ← center-crop train/val only; test on full pages
+  final_model.pt
+  results.json
+  benchmark_eval_results.json
 ```
 ## Intended use
 ## Acknowledgements
+Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.