--- language: - bo license: apache-2.0 library_name: transformers tags: - image-classification - dinov3 - tibetan - script-classification - paleography - fine-tuned - document-analysis base_model: facebook/dinov3-vits16-pretrain-lvd1689m datasets: - openpecha/tibetan-script-images metrics: - f1 - accuracy pipeline_tag: image-classification model-index: - name: Tibetan Script Classifier (DINOv3 ViT-S) results: - task: type: image-classification name: Tibetan Script Classification metrics: - name: Macro F1 (whole page) type: f1 value: 0.512 - name: Accuracy (whole page) type: accuracy value: 0.571 - name: Macro F1 (CLAHE patches, page-level) type: f1 value: 0.529 --- # Tibetan Script Classifier (DINOv3) This repository contains fine-tuned checkpoints for identifying 18 distinct categories of Tibetan manuscript scripts. This research was conducted to develop automated paleographic identification tools for historical archives. ## Project Information - **Project Name:** The BDRC Etext Corpus - **Developed by:** Dharmaduta - **Specifications provided by:** [Buddhist Digital Resource Center (BDRC)](https://www.bdrc.io) - **Funded by:** Khyentse Foundation - **Core Model:** DINOv3 ViT-S/16 (`facebook/dinov3-vits16-pretrain-lvd1689m`) ## Evaluation Results | Experiment | Evaluation Level | Macro F1 | Accuracy | | :--- | :--- | :---: | :---: | | **whole_page** | Image-level | 0.512 | 57.11% | | **patches_clahe** | Page-level (Aggregated) | **0.529** | 52.61% | | **patches_color** | Page-level (Aggregated) | 0.504 | 50.17% | *Note: The **whole_page** model is recommended for general use due to its balanced performance and simpler inference pipeline.* ## Label Set (18 Classes) The model is trained to recognize the following scripts: `dhumri`, `difficult`, `drathung`, `drudring`, `druring`, `druthung`, `khyuyig`, `multi_scripts`, `non_tibetan`, `peri`, `petsuk`, `trinyig`, `tsegdrig`, `tsugchung`, `tsumachug`, `uchen_sugdring`, `uchen_sugthung`, `yigchung`. ## Preprocessing Variants - **whole_page**: Short-edge resize to 224px followed by a 224×224 center crop. - **patches_color**: Sliding-window 224×224 patches with 25% overlap. - **patches_clahe**: Same patch layout as above, but with **Contrast Limited Adaptive Histogram Equalization (CLAHE)** applied to grayscale inputs to enhance script visibility. ## Training Recipe Training was executed via a 3-stage progressive unfreezing strategy: 1. **Stage A (Head Only):** 20 epochs, backbone frozen (LR: 1e-3). 2. **Stage B (Partial):** 10 epochs, unfreezing the last 2 Transformer blocks (Backbone LR: 1e-5). 3. **Stage C (Full):** 10 epochs, unfreezing the last 4 Transformer blocks (Backbone LR: 5e-6). Class-weighted cross-entropy loss was utilized to mitigate high dataset imbalance across script types. ## How to Use ### Loading the Model ```python import torch from finetune_dinov3 import DINOv3Classifier # Load Stage B Whole Page Checkpoint payload = torch.load("whole_page/final_model.pt", map_location="cpu") model = DINOv3Classifier("facebook/dinov3-vits16-pretrain-lvd1689m", num_classes=18) model.load_state_dict(payload["model_state_dict"]) model.eval()