File size: 3,233 Bytes
09ba195 18c257a 09ba195 18c257a 09ba195 18c257a 09ba195 18c257a 09ba195 18c257a 09ba195 a1d2f62 18c257a a1d2f62 f071f6f 18c257a a1d2f62 f071f6f a1d2f62 18c257a a1d2f62 18c257a a1d2f62 f071f6f 18c257a a1d2f62 f071f6f 18c257a a1d2f62 f071f6f 18c257a a1d2f62 18c257a a1d2f62 f071f6f a1d2f62 18c257a a1d2f62 18c257a a1d2f62 18c257a a1d2f62 18c257a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | ---
language:
- bo
license: apache-2.0
library_name: transformers
tags:
- image-classification
- dinov3
- tibetan
- script-classification
- paleography
- fine-tuned
- document-analysis
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
datasets:
- openpecha/tibetan-script-images
metrics:
- f1
- accuracy
pipeline_tag: image-classification
model-index:
- name: Tibetan Script Classifier (DINOv3 ViT-S)
results:
- task:
type: image-classification
name: Tibetan Script Classification
metrics:
- name: Macro F1 (whole page)
type: f1
value: 0.512
- name: Accuracy (whole page)
type: accuracy
value: 0.571
- name: Macro F1 (CLAHE patches, page-level)
type: f1
value: 0.529
---
# Tibetan Script Classifier (DINOv3)
This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives.
## Project Information
- **Project Name:** The BDRC Etext Corpus
- **Developed by:** Dharmaduta
- **Specifications provided by:** [Buddhist Digital Resource Center (BDRC)](https://www.bdrc.io)
- **Funded by:** Khyentse Foundation
- **Core Model:** DINOv3 ViT-S/16 (`facebook/dinov3-vits16-pretrain-lvd1689m`)
## Evaluation Results
| Experiment | Evaluation Level | Macro F1 | Accuracy |
| :--- | :--- | :---: | :---: |
| **whole_page** | Image-level | 0.512 | 57.11% |
| **patches_clahe** | Page-level (Aggregated) | **0.529** | 52.61% |
| **patches_color** | Page-level (Aggregated) | 0.504 | 50.17% |
*Note: The **whole_page** model is recommended for general use due to its balanced performance and simpler inference pipeline.*
## Label Set (18 Classes)
The model is trained to recognize the following scripts:
`dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`.
## Preprocessing Variants
- **whole_page**: Short-edge resize to 224px followed by a 224×224 center crop.
- **patches_color**: Sliding-window 224×224 patches with 25% overlap.
- **patches_clahe**: Same patch layout as above, but with **Contrast Limited Adaptive Histogram Equalization (CLAHE)** applied to grayscale inputs to enhance script visibility.
## Training Recipe
Training was executed via a 3-stage progressive unfreezing strategy:
1. **Stage A (Head Only):** 20 epochs, backbone frozen (LR: 1e-3).
2. **Stage B (Partial):** 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5).
3. **Stage C (Full):** 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6).
Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types.
## How to Use
### Loading the Model
```python
import torch
from finetune_dinov3 import DINOv3Classifier
# Load Stage B Whole Page Checkpoint
payload = torch.load("whole_page/final_model.pt", map_location="cpu")
model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18)
model.load_state_dict(payload["model_state_dict"])
model.eval() |