Instructions to use openpecha/uchen-ume-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openpecha/uchen-ume-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="openpecha/uchen-ume-classifier") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openpecha/uchen-ume-classifier", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| tags: | |
| - image-classification | |
| - tibetan | |
| - uchen | |
| - ume | |
| library_name: transformers | |
| pipeline_tag: image-classification | |
| # Uchen vs Umê classifier (DINOv3 ViT-S) | |
| Binary Tibetan script classifier: **uchen** (printed) vs **ume** (cursive). | |
| **Dataset (splits, Parquet, inference):** [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark) | |
| ## Training preprocess (from `config.yaml` + `train.py`) | |
| `train.py` builds **three** dataloaders with **per-split** preprocess (`preprocess_for_split` in `common.py`): | |
| | Split | `with_preprocess` config | Effect in `ScriptImageDataset.__getitem__` | | |
| |-------|--------------------------|------------------------------------------| | |
| | **train** | `train_preprocess: center_crop_whole_page` | Center crop before augment + DINO processor | | |
| | **val** | `val_preprocess: center_crop_whole_page` | Center crop before DINO processor | | |
| | **test** | `test_preprocess: none` | **Full page** — no crop, only DINO processor | | |
| So high **validation** scores for `with_preprocess` (val F1 ~0.99) are on **cropped** pages. **Test** during training uses **full pages** (test F1 ~0.51). That is intentional in the code, not a bug. | |
| **Benchmark eval must use `test_preprocess: none`** (same as the test split) unless you are deliberately measuring crop-to-crop generalization. | |
| ## Recommended weights for full manuscript pages | |
| **`without_preprocess/final_model.pt`** — trained without runtime crop on any split. | |
| ## Results summary | |
| **Benchmark** = 60 held-out images (30 uchen + 30 ume). **Test** = 867 images (work-stratified), full pages. | |
| | Variant | Train/val preprocess | Test & benchmark eval preprocess | Test acc | Test macro-F1 | Benchmark acc | Benchmark macro-F1 | Benchmark AUC | | |
| |---------|---------------------|----------------------------------|----------|---------------|---------------|-------------------|---------------| | |
| | **`without_preprocess/`** | none | **none** (full page) | **80.7%** | **0.708** | **85.0%** | **0.848** | 0.970 | | |
| | **`with_preprocess/`** | center crop | **none** (full page) | 56.1% | 0.506 | **68.3%** | **0.648** | 0.953 | | |
| | ~~with_preprocess~~ | center crop | ~~center crop at inference~~ *(not comparable to test)* | — | — | ~~98.3%~~ | ~~0.983~~ | — | | |
| The ~~98.3%~~ benchmark number only appears if you **center-crop at inference**, which matches **val** but **not** how the model was evaluated on **test** during training. | |
| ## Benchmark evaluation (60 images) | |
| ### Fair eval — full pages (`preprocess none`, matches `test_preprocess`) | |
| **`without_preprocess` (recommended):** | |
| ```bash | |
| python inference_uchen_ume.py \ | |
| --benchmark-dir benchmark \ | |
| --weights without_preprocess/final_model.pt \ | |
| --preprocess none | |
| ``` | |
| **`with_preprocess` (same protocol as training test split):** | |
| ```bash | |
| python inference_uchen_ume.py \ | |
| --benchmark-dir benchmark \ | |
| --weights with_preprocess/final_model.pt \ | |
| --preprocess none | |
| ``` | |
| From this repo: | |
| ```bash | |
| python experiments/uchen_ume_binary/eval_benchmark.py \ | |
| --checkpoint without_preprocess/final_model.pt --benchmark-dir benchmark/benchmark | |
| python experiments/uchen_ume_binary/eval_benchmark.py \ | |
| --checkpoint with_preprocess/final_model.pt --benchmark-dir benchmark/benchmark | |
| # default test-preprocess is none — do NOT pass center_crop for fair comparison | |
| ``` | |
| ## Parquet dataset | |
| [openpecha/uchen-ume-classification-benchmark](https://huggingface.co/datasets/openpecha/uchen-ume-classification-benchmark) | |
| ```python | |
| from datasets import load_dataset | |
| bench = load_dataset("openpecha/uchen-ume-classification-benchmark", split="benchmark") | |
| ``` | |
| ## Load weights | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import torch | |
| path = hf_hub_download("openpecha/uchen-ume-classifier", "without_preprocess/final_model.pt", repo_type="model") | |
| ckpt = torch.load(path, map_location="cpu", weights_only=False) | |
| ``` | |