openpecha
/

tibetan-script-classifier

Image Classification

script-classification

document-analysis

Eval Results (legacy)

Model card Files Files and versions

tibetan-script-classifier / README.md

karma689's picture

Update README.md

f071f6f verified about 21 hours ago

|

history blame contribute delete

3.23 kB

	---
	language:
	- bo
	license: apache-2.0
	library_name: transformers
	tags:
	- image-classification
	- dinov3
	- tibetan
	- script-classification
	- paleography
	- fine-tuned
	- document-analysis
	base_model: facebook/dinov3-vits16-pretrain-lvd1689m
	datasets:
	- openpecha/tibetan-script-images
	metrics:
	- f1
	- accuracy
	pipeline_tag: image-classification
	model-index:
	- name: Tibetan Script Classifier (DINOv3 ViT-S)
	results:
	- task:
	type: image-classification
	name: Tibetan Script Classification
	metrics:
	- name: Macro F1 (whole page)
	type: f1
	value: 0.512
	- name: Accuracy (whole page)
	type: accuracy
	value: 0.571
	- name: Macro F1 (CLAHE patches, page-level)
	type: f1
	value: 0.529
	---

	# Tibetan Script Classifier (DINOv3)

	This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives.

	## Project Information
	- Project Name: The BDRC Etext Corpus
	- Developed by: Dharmaduta
	- Specifications provided by: [Buddhist Digital Resource Center (BDRC)](https://www.bdrc.io)
	- Funded by: Khyentse Foundation
	- Core Model: DINOv3 ViT-S/16 (`facebook/dinov3-vits16-pretrain-lvd1689m`)

	## Evaluation Results

	\| Experiment \| Evaluation Level \| Macro F1 \| Accuracy \|
	\| :--- \| :--- \| :---: \| :---: \|
	\| whole_page \| Image-level \| 0.512 \| 57.11% \|
	\| patches_clahe \| Page-level (Aggregated) \| 0.529 \| 52.61% \|
	\| patches_color \| Page-level (Aggregated) \| 0.504 \| 50.17% \|

	Note: The whole_page* model is recommended for general use due to its balanced performance and simpler inference pipeline.*

	## Label Set (18 Classes)
	The model is trained to recognize the following scripts:
	`dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`.

	## Preprocessing Variants
	- whole_page: Short-edge resize to 224px followed by a 224×224 center crop.
	- patches_color: Sliding-window 224×224 patches with 25% overlap.
	- patches_clahe: Same patch layout as above, but with Contrast Limited Adaptive Histogram Equalization (CLAHE) applied to grayscale inputs to enhance script visibility.

	## Training Recipe
	Training was executed via a 3-stage progressive unfreezing strategy:
	1. Stage A (Head Only): 20 epochs, backbone frozen (LR: 1e-3).
	2. Stage B (Partial): 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5).
	3. Stage C (Full): 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6).

	Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types.

	## How to Use

	### Loading the Model
	```python
	import torch
	from finetune_dinov3 import DINOv3Classifier

	# Load Stage B Whole Page Checkpoint
	payload = torch.load("whole_page/final_model.pt", map_location="cpu")
	model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18)
	model.load_state_dict(payload["model_state_dict"])
	model.eval()