File size: 3,233 Bytes

09ba195
 
18c257a
09ba195
 
 
18c257a
 
 
 
 
 
 
09ba195
 
18c257a
09ba195
18c257a
 
09ba195
 
18c257a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09ba195
 
a1d2f62
 
18c257a
a1d2f62
f071f6f
18c257a
 
 
 
 
a1d2f62
f071f6f
a1d2f62
18c257a
 
 
 
 
a1d2f62
18c257a
a1d2f62
f071f6f
18c257a
 
a1d2f62
f071f6f
18c257a
 
 
a1d2f62
f071f6f
18c257a
 
 
 
a1d2f62
18c257a
a1d2f62
f071f6f
a1d2f62
18c257a
a1d2f62
 
18c257a
a1d2f62
18c257a
 
 
a1d2f62
18c257a

---
language:
- bo
license: apache-2.0
library_name: transformers
tags:
- image-classification
- dinov3
- tibetan
- script-classification
- paleography
- fine-tuned
- document-analysis
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
datasets:
- openpecha/tibetan-script-images
metrics:
- f1
- accuracy
pipeline_tag: image-classification
model-index:
- name: Tibetan Script Classifier (DINOv3 ViT-S)
  results:
  - task:
      type: image-classification
      name: Tibetan Script Classification
    metrics:
    - name: Macro F1 (whole page)
      type: f1
      value: 0.512
    - name: Accuracy (whole page)
      type: accuracy
      value: 0.571
    - name: Macro F1 (CLAHE patches, page-level)
      type: f1
      value: 0.529
---

# Tibetan Script Classifier (DINOv3)

This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives.

##  Project Information
- **Project Name:** The BDRC Etext Corpus
- **Developed by:** Dharmaduta
- **Specifications provided by:** [Buddhist Digital Resource Center (BDRC)](https://www.bdrc.io)
- **Funded by:** Khyentse Foundation
- **Core Model:** DINOv3 ViT-S/16 (`facebook/dinov3-vits16-pretrain-lvd1689m`)

##  Evaluation Results

| Experiment | Evaluation Level | Macro F1 | Accuracy |
| :--- | :--- | :---: | :---: |
| **whole_page** | Image-level | 0.512 | 57.11% |
| **patches_clahe** | Page-level (Aggregated) | **0.529** | 52.61% |
| **patches_color** | Page-level (Aggregated) | 0.504 | 50.17% |

*Note: The **whole_page** model is recommended for general use due to its balanced performance and simpler inference pipeline.*

##  Label Set (18 Classes)
The model is trained to recognize the following scripts:
`dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`.

##  Preprocessing Variants
- **whole_page**: Short-edge resize to 224px followed by a 224×224 center crop.
- **patches_color**: Sliding-window 224×224 patches with 25% overlap.
- **patches_clahe**: Same patch layout as above, but with **Contrast Limited Adaptive Histogram Equalization (CLAHE)** applied to grayscale inputs to enhance script visibility.

##  Training Recipe
Training was executed via a 3-stage progressive unfreezing strategy:
1. **Stage A (Head Only):** 20 epochs, backbone frozen (LR: 1e-3).
2. **Stage B (Partial):** 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5).
3. **Stage C (Full):** 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6).

Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types.

##  How to Use

### Loading the Model
```python
import torch
from finetune_dinov3 import DINOv3Classifier

# Load Stage B Whole Page Checkpoint
payload = torch.load("whole_page/final_model.pt", map_location="cpu")
model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18)
model.load_state_dict(payload["model_state_dict"])
model.eval()