language:
- bo
license: apache-2.0
library_name: transformers
tags:
- image-classification
- dinov3
- tibetan
- script-classification
- paleography
- fine-tuned
- document-analysis
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
datasets:
- openpecha/tibetan-script-images
metrics:
- f1
- accuracy
pipeline_tag: image-classification
model-index:
- name: Tibetan Script Classifier (DINOv3 ViT-S)
results:
- task:
type: image-classification
name: Tibetan Script Classification
metrics:
- name: Macro F1 (whole page)
type: f1
value: 0.512
- name: Accuracy (whole page)
type: accuracy
value: 0.571
- name: Macro F1 (CLAHE patches, page-level)
type: f1
value: 0.529
Tibetan Script Classifier (DINOv3)
This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives.
Project Information
- Project Name: The BDRC Etext Corpus
- Developed by: Dharmaduta
- Specifications provided by: Buddhist Digital Resource Center (BDRC)
- Funded by: Khyentse Foundation
- Core Model: DINOv3 ViT-S/16 (
facebook/dinov3-vits16-pretrain-lvd1689m)
Evaluation Results
| Experiment | Evaluation Level | Macro F1 | Accuracy |
|---|---|---|---|
| whole_page | Image-level | 0.512 | 57.11% |
| patches_clahe | Page-level (Aggregated) | 0.529 | 52.61% |
| patches_color | Page-level (Aggregated) | 0.504 | 50.17% |
Note: The whole_page model is recommended for general use due to its balanced performance and simpler inference pipeline.
Label Set (18 Classes)
The model is trained to recognize the following scripts:
dhumri, difficult, drathung, drudring, druring, druthung, khyuyig, multi_scripts, non_tibetan, peri, petsuk, trinyig, tsegdrig, tsugchung, tsumachug, uchen_sugdring, uchen_sugthung, yigchung.
Preprocessing Variants
- whole_page: Short-edge resize to 224px followed by a 224×224 center crop.
- patches_color: Sliding-window 224×224 patches with 25% overlap.
- patches_clahe: Same patch layout as above, but with Contrast Limited Adaptive Histogram Equalization (CLAHE) applied to grayscale inputs to enhance script visibility.
Training Recipe
Training was executed via a 3-stage progressive unfreezing strategy:
- Stage A (Head Only): 20 epochs, backbone frozen (LR: 1e-3).
- Stage B (Partial): 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5).
- Stage C (Full): 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6).
Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types.
How to Use
Loading the Model
import torch
from finetune_dinov3 import DINOv3Classifier
# Load Stage B Whole Page Checkpoint
payload = torch.load("whole_page/final_model.pt", map_location="cpu")
model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18)
model.load_state_dict(payload["model_state_dict"])
model.eval()