---
language:
- bo
license: apache-2.0
library_name: transformers
tags:
- image-classification
- dinov3
- tibetan
- script-classification
- paleography
- fine-tuned
- document-analysis
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
datasets:
- openpecha/tibetan-script-images
metrics:
- f1
- accuracy
pipeline_tag: image-classification
model-index:
- name: Tibetan Script Classifier (DINOv3 ViT-S)
  results:
  - task:
      type: image-classification
      name: Tibetan Script Classification
    metrics:
    - name: Macro F1 (whole page)
      type: f1
      value: 0.512
    - name: Accuracy (whole page)
      type: accuracy
      value: 0.571
    - name: Macro F1 (CLAHE patches, page-level)
      type: f1
      value: 0.529
---

# Tibetan Script Classifier (DINOv3)

This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives.

##  Project Information
- **Project Name:** The BDRC Etext Corpus
- **Developed by:** Dharmaduta
- **Specifications provided by:** [Buddhist Digital Resource Center (BDRC)](https://www.bdrc.io)
- **Funded by:** Khyentse Foundation
- **Core Model:** DINOv3 ViT-S/16 (`facebook/dinov3-vits16-pretrain-lvd1689m`)

##  Evaluation Results

| Experiment | Evaluation Level | Macro F1 | Accuracy |
| :--- | :--- | :---: | :---: |
| **whole_page** | Image-level | 0.512 | 57.11% |
| **patches_clahe** | Page-level (Aggregated) | **0.529** | 52.61% |
| **patches_color** | Page-level (Aggregated) | 0.504 | 50.17% |

*Note: The **whole_page** model is recommended for general use due to its balanced performance and simpler inference pipeline.*

##  Label Set (18 Classes)
The model is trained to recognize the following scripts:
`dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`.

##  Preprocessing Variants
- **whole_page**: Short-edge resize to 224px followed by a 224×224 center crop.
- **patches_color**: Sliding-window 224×224 patches with 25% overlap.
- **patches_clahe**: Same patch layout as above, but with **Contrast Limited Adaptive Histogram Equalization (CLAHE)** applied to grayscale inputs to enhance script visibility.

##  Training Recipe
Training was executed via a 3-stage progressive unfreezing strategy:
1. **Stage A (Head Only):** 20 epochs, backbone frozen (LR: 1e-3).
2. **Stage B (Partial):** 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5).
3. **Stage C (Full):** 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6).

Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types.

##  How to Use

### Loading the Model
```python
import torch
from finetune_dinov3 import DINOv3Classifier

# Load Stage B Whole Page Checkpoint
payload = torch.load("whole_page/final_model.pt", map_location="cpu")
model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18)
model.load_state_dict(payload["model_state_dict"])
model.eval()