karma689 commited on
Commit
8ada150
·
verified ·
1 Parent(s): 444f57e

Update README with best full-page results and benchmark holdout metrics

Browse files
Files changed (1) hide show
  1. README.md +86 -36
README.md CHANGED
@@ -1,40 +1,54 @@
1
  ---
2
  language:
3
- - bo
4
  license: apache-2.0
5
  tags:
6
- - image-classification
7
- - tibetan
8
- - uchen
9
- - ume
10
- - script-classification
11
- - dinov3
12
- - fine-tuned
13
  library_name: transformers
14
  pipeline_tag: image-classification
15
  base_model: facebook/dinov3-vits16-pretrain-lvd1689m
16
  datasets:
17
- - openpecha/uchen-ume-classification-benchmark
18
  metrics:
19
- - f1
20
- - accuracy
21
  model-index:
22
- - name: Uchen-Ume Classifier (DINOv3 ViT-S)
23
- results:
24
- - task:
25
- type: image-classification
26
- name: Tibetan Script Classification (Uchen vs Ume)
27
- dataset:
28
- name: openpecha/uchen-ume-classification-benchmark
29
- type: openpecha/uchen-ume-classification-benchmark
30
- split: test
31
- metrics:
32
- - name: Macro F1 (full page)
33
- type: f1
34
- value: 0.708
35
- - name: Accuracy (full page)
36
- type: accuracy
37
- value: 0.807
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ---
39
 
40
  # Uchen vs Umê Classifier (DINOv3 ViT-S)
@@ -47,16 +61,25 @@ Binary Tibetan script classifier: **Uchen** (དབུ་ཅན།, headed/print
47
 
48
  **Use `without_preprocess/final_model.pt`** for production. This model was trained and evaluated on full manuscript pages with no preprocessing — what you get is what you deploy.
49
 
50
- ## Results
51
 
52
- Test set = 867 images, work-stratified split, no overlap with training works.
53
 
54
- | Variant | Train/val preprocess | Test preprocess | Test acc | Test macro-F1 |
55
- |---------|---------------------|-----------------|:--------:|:-------------:|
56
- | **`without_preprocess/`** (recommended) | none | none (full page) | **80.7%** | **0.708** |
57
- | `with_preprocess/` | center crop | none (full page) | 56.1% | 0.506 |
 
 
58
 
59
- The `without_preprocess` variant is trained and tested on full pages — no mismatch between training and inference. The `with_preprocess` variant achieves ~99% validation F1 on center-cropped images (matching its training distribution), but drops to 56% when tested on full pages because the model has never seen uncropped input. This train–test mismatch makes it unsuitable for production where raw manuscript images are the input.
 
 
 
 
 
 
 
60
 
61
  ## Training data
62
 
@@ -64,7 +87,7 @@ The `without_preprocess` variant is trained and tested on full pages — no mism
64
  |-------|------:|-----------:|-----:|------:|
65
  | Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
66
  | Ume | ~5,986 | ~660 | ~561 | ~7,207 |
67
- | **Total** | **9,110** | **1,000** | **851** | **10,961** |
68
 
69
  **Uchen** includes: `uchen_sugthung`, `uchen_sugdring`, `uchen_sugring` (distinguished by descender length).
70
 
@@ -72,6 +95,8 @@ The `without_preprocess` variant is trained and tested on full pages — no mism
72
 
73
  **Excluded:** `difficult`, `multi_scripts`, `non_tibetan`.
74
 
 
 
75
  Splits are partitioned at the **work level** — all pages from the same manuscript (`W` prefix in the filename) stay in one split only.
76
 
77
  ## Architecture
@@ -136,6 +161,17 @@ label = "uchen" if probs[0] > probs[1] else "ume"
136
  print(f"{label} ({probs.max():.1%})")
137
  ```
138
 
 
 
 
 
 
 
 
 
 
 
 
139
  ### Load the dataset
140
 
141
  ```python
@@ -145,6 +181,20 @@ ds = load_dataset("openpecha/uchen-ume-classification-benchmark")
145
  train = ds["train"] # 9,110 images
146
  val = ds["validation"] # 1,000 images
147
  test = ds["test"] # 851 images
 
 
 
 
 
 
 
 
 
 
 
 
 
 
148
  ```
149
 
150
  ## Intended use
@@ -181,4 +231,4 @@ Manuscript image
181
 
182
  ## Acknowledgements
183
 
184
- Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.
 
1
  ---
2
  language:
3
+ - bo
4
  license: apache-2.0
5
  tags:
6
+ - image-classification
7
+ - tibetan
8
+ - uchen
9
+ - ume
10
+ - script-classification
11
+ - dinov3
12
+ - fine-tuned
13
  library_name: transformers
14
  pipeline_tag: image-classification
15
  base_model: facebook/dinov3-vits16-pretrain-lvd1689m
16
  datasets:
17
+ - openpecha/uchen-ume-classification-benchmark
18
  metrics:
19
+ - f1
20
+ - accuracy
21
  model-index:
22
+ - name: Uchen-Ume Classifier (DINOv3 ViT-S)
23
+ results:
24
+ - task:
25
+ type: image-classification
26
+ name: Tibetan Script Classification (Uchen vs Ume)
27
+ dataset:
28
+ name: openpecha/uchen-ume-classification-benchmark
29
+ type: openpecha/uchen-ume-classification-benchmark
30
+ split: test
31
+ metrics:
32
+ - name: Macro F1 (full page)
33
+ type: f1
34
+ value: 0.708
35
+ - name: Accuracy (full page)
36
+ type: accuracy
37
+ value: 0.807
38
+ - task:
39
+ type: image-classification
40
+ name: Held-out benchmark (60 pages, full page)
41
+ dataset:
42
+ name: openpecha/uchen-ume-classification-benchmark
43
+ type: openpecha/uchen-ume-classification-benchmark
44
+ split: benchmark
45
+ metrics:
46
+ - name: Macro F1 (full page)
47
+ type: f1
48
+ value: 0.848
49
+ - name: Accuracy (full page)
50
+ type: accuracy
51
+ value: 0.850
52
  ---
53
 
54
  # Uchen vs Umê Classifier (DINOv3 ViT-S)
 
61
 
62
  **Use `without_preprocess/final_model.pt`** for production. This model was trained and evaluated on full manuscript pages with no preprocessing — what you get is what you deploy.
63
 
64
+ ## Best results (full pages)
65
 
66
+ Test set = 867 images, work-stratified split, no overlap with training works. Benchmark = 60 held-out pages (30 uchen / 30 ume), disjoint from train/val/test.
67
 
68
+ | Eval | Split | Images | Accuracy | Macro-F1 | AUC |
69
+ |------|-------|-------:|---------:|---------:|----:|
70
+ | **`without_preprocess/`** (recommended) | Test | 867 | **80.7%** | **0.708** | 0.970 |
71
+ | **`without_preprocess/`** (recommended) | Benchmark | 60 | **85.0%** | **0.848** | 0.970 |
72
+ | `with_preprocess/` | Test | 867 | 56.1% | 0.506 | 0.969 |
73
+ | `with_preprocess/` | Benchmark | 60 | 68.3% | 0.648 | 0.953 |
74
 
75
+ ### Variant comparison
76
+
77
+ | Variant | Train/val preprocess | Test & benchmark preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 |
78
+ |---------|---------------------|-----------------------------|:--------:|:-------------:|:-------------:|:------------------:|
79
+ | **`without_preprocess/`** | none | none (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** |
80
+ | `with_preprocess/` | center crop | none (full page) | 56.1% | 0.506 | 68.3% | 0.648 |
81
+
82
+ The `without_preprocess` variant is trained and tested on full pages — no mismatch between training and inference. The `with_preprocess` variant achieves ~99% validation F1 on center-cropped images (matching its training distribution), but drops to 56% when tested on full pages because the model has never seen uncropped input. Do **not** report ~99% test scores from runs that center-crop test at eval time.
83
 
84
  ## Training data
85
 
 
87
  |-------|------:|-----------:|-----:|------:|
88
  | Uchen | ~3,124 | ~340 | ~290 | ~3,754 |
89
  | Ume | ~5,986 | ~660 | ~561 | ~7,207 |
90
+ | **Total pages** | **9,110** | **1,000** | **851** | **10,961** |
91
 
92
  **Uchen** includes: `uchen_sugthung`, `uchen_sugdring`, `uchen_sugring` (distinguished by descender length).
93
 
 
95
 
96
  **Excluded:** `difficult`, `multi_scripts`, `non_tibetan`.
97
 
98
+ Benchmark pages (60) are excluded from train/val/test via the published split manifest.
99
+
100
  Splits are partitioned at the **work level** — all pages from the same manuscript (`W` prefix in the filename) stay in one split only.
101
 
102
  ## Architecture
 
161
  print(f"{label} ({probs.max():.1%})")
162
  ```
163
 
164
+ ### Benchmark inference (full pages)
165
+
166
+ ```bash
167
+ pip install -r https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark/raw/main/requirements-inference.txt
168
+ python https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark/raw/main/inference_uchen_ume.py \
169
+ --benchmark-json benchmark/benchmark_holdout.json \
170
+ --fetch-urls \
171
+ --weights without_preprocess/final_model.pt \
172
+ --preprocess none
173
+ ```
174
+
175
  ### Load the dataset
176
 
177
  ```python
 
181
  train = ds["train"] # 9,110 images
182
  val = ds["validation"] # 1,000 images
183
  test = ds["test"] # 851 images
184
+ bench = ds["benchmark"] # 60 images
185
+ ```
186
+
187
+ ## Repo layout
188
+
189
+ ```
190
+ without_preprocess/ ← recommended (full-page test & benchmark)
191
+ final_model.pt
192
+ results.json
193
+ benchmark_eval_results.json
194
+ with_preprocess/ ← center-crop train/val only; test on full pages
195
+ final_model.pt
196
+ results.json
197
+ benchmark_eval_results.json
198
  ```
199
 
200
  ## Intended use
 
231
 
232
  ## Acknowledgements
233
 
234
+ Developed by **Dharmaduta** for the **[Buddhist Digital Resource Center](https://www.bdrc.io)** (BDRC) Etext Corpus project, with funding from the **Khyentse Foundation**. Annotation guidelines by **Pentsok Rtsang**.