Tibetan Script Classifier (DINOv3)

This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives.

Project Information

Project Name: The BDRC Etext Corpus
Developed by: Dharmaduta
Specifications provided by: Buddhist Digital Resource Center (BDRC)
Funded by: Khyentse Foundation
Core Model: DINOv3 ViT-S/16 (facebook/dinov3-vits16-pretrain-lvd1689m)

Evaluation Results

Experiment	Evaluation Level	Macro F1	Accuracy
whole_page	Image-level	0.512	57.11%
patches_clahe	Page-level (Aggregated)	0.529	52.61%
patches_color	Page-level (Aggregated)	0.504	50.17%

Note: The whole_page model is recommended for general use due to its balanced performance and simpler inference pipeline.

Label Set (18 Classes)

The model is trained to recognize the following scripts: dhumri, difficult, drathung, drudring, druring, druthung, khyuyig, multi_scripts, non_tibetan, peri, petsuk, trinyig, tsegdrig, tsugchung, tsumachug, uchen_sugdring, uchen_sugthung, yigchung.

Preprocessing Variants

whole_page: Short-edge resize to 224px followed by a 224×224 center crop.
patches_color: Sliding-window 224×224 patches with 25% overlap.
patches_clahe: Same patch layout as above, but with Contrast Limited Adaptive Histogram Equalization (CLAHE) applied to grayscale inputs to enhance script visibility.

Training Recipe

Training was executed via a 3-stage progressive unfreezing strategy:

Stage A (Head Only): 20 epochs, backbone frozen (LR: 1e-3).
Stage B (Partial): 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5).
Stage C (Full): 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6).

Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types.

How to Use

Loading the Model

import torch
from finetune_dinov3 import DINOv3Classifier

# Load Stage B Whole Page Checkpoint
payload = torch.load("whole_page/final_model.pt", map_location="cpu")
model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18)
model.load_state_dict(payload["model_state_dict"])
model.eval()

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for openpecha/tibetan-script-classifier

Base model

facebook/dinov3-vit7b16-pretrain-lvd1689m

Finetuned

facebook/dinov3-vits16-pretrain-lvd1689m

Finetuned

(13)

this model

Dataset used to train openpecha/tibetan-script-classifier

Evaluation results

Macro F1 (whole page)
self-reported

0.512
Accuracy (whole page)
self-reported

0.571
Macro F1 (CLAHE patches, page-level)
self-reported

0.529