BDRC
/

Tibetan_Modern_Book_Layout_Detection_Model

 ---
+library_name: ultralytics
+task: object-detection
+tags:
+  - yolo
+  - yolo26
+  - tibetan
+  - document-layout-analysis
+  - object-detection
+  - bounding-box
+  - BDRC
+language:
+  - bo
 license: cc0-1.0
+datasets:
+  - BDRC/TDLA-Training-Dataset
+metrics:
+  - mAP50
+  - mAP50-95
+  - precision
+  - recall
+model-index:
+  - name: TDLA-YOLO26m
+    results:
+      - task:
+          type: object-detection
+          name: Object Detection
+        dataset:
+          type: BDRC/TDLA-Training-Dataset
+          name: TDLA Training Dataset
+          split: val
+        metrics:
+          - type: mAP50
+            value: 0.982
+            name: mAP@0.5
+          - type: mAP50-95
+            value: 0.799
+            name: mAP@0.5:0.95
+          - type: precision
+            value: 0.966
+            name: Precision
+          - type: recall
+            value: 0.970
+            name: Recall
 ---
+# TDLA-YOLO26m — Tibetan Document Layout Analysis
+A fine-tuned **YOLO26m** object-detection model for **Tibetan Document Layout Analysis (TDLA)**. The model detects four layout classes in Tibetan document page images: **header**, **Text area**, **footnote**, and **footer**.
+## Model Description
+This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.
+| Property | Value |
+| --- | --- |
+| **Architecture** | YOLO26m |
+| **Task** | Object Detection |
+| **Image size** | 640 × 640 |
+| **Number of classes** | 4 |
+| **Training platform** | Ultralytics HUB |
+| **Weights file** | `Tibetan_modern_book_Layout_detection.pt` |
+## Classes
+| ID | Class | Description |
+| --- | --- | --- |
+| 0 | header | Page header region |
+| 1 | Text area | Main body text region |
+| 2 | footnote | Footnote region |
+| 3 | footer | Page footer region |
+## Performance
+Evaluated on the validation split of the TDLA Training Dataset.
+| Metric | Value |
+| --- | --- |
+| **Precision** | 0.966 |
+| **Recall** | 0.970 |
+| **mAP@0.5** | 0.982 |
+| **mAP@0.5:0.95** | 0.799 |
+### Training Loss (final epoch)
+| Loss Component | Train | Val |
+| --- | --- | --- |
+| Box loss | 0.515 | 0.643 |
+| Classification loss | 0.218 | 0.276 |
+| DFL loss | 0.003 | 0.004 |
+## Training Details
+### Dataset
+- **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset)
+- **Train images:** 2,692
+- **Val images:** 103
+- **Test images:** 313
+- **Total annotations:** 14,705
+- **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio)
+### Hyperparameters
+| Parameter | Value |
+| --- | --- |
+| Epochs | 150 |
+| Patience | 100 |
+| Batch size | Auto (-1) |
+| Image size | 640 |
+| Optimizer | Auto (SGD) |
+| Initial learning rate (lr0) | 0.01 |
+| Final learning rate factor (lrf) | 0.01 |
+| Momentum | 0.937 |
+| Weight decay | 0.0005 |
+| Warmup epochs | 3.0 |
+| Warmup momentum | 0.8 |
+| Warmup bias lr | 0.1 |
+| AMP (mixed precision) | True |
+| Pretrained | True |
+| Deterministic | True |
+| Seed | 0 |
+### Loss Weights
+| Component | Weight |
+| --- | --- |
+| Box | 7.5 |
+| Classification | 0.5 |
+| DFL | 1.5 |
+### Augmentation
+| Augmentation | Value |
+| --- | --- |
+| HSV-Hue | 0.015 |
+| HSV-Saturation | 0.7 |
+| HSV-Value | 0.4 |
+| Translation | 0.1 |
+| Scale | 0.5 |
+| Flip left-right | 0.5 |
+| Mosaic | 1.0 |
+| Erasing | 0.4 |
+| Close mosaic (last N epochs) | 10 |
+| Auto augment | RandAugment |
+## Usage
+### Inference with Ultralytics
+```python
+from ultralytics import YOLO
+model = YOLO("Tibetan_modern_book_Layout_detection.pt")
+results = model.predict("page_image.jpg", imgsz=640)
+for result in results:
+    boxes = result.boxes
+    for box in boxes:
+        cls_id = int(box.cls)
+        conf = float(box.conf)
+        xyxy = box.xyxy[0].tolist()
+        print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")
+```
+### Batch Inference
+```python
+from ultralytics import YOLO
+model = YOLO("Tibetan_modern_book_Layout_detection.pt")
+results = model.predict("path/to/images/", imgsz=640, conf=0.25)
+```
+## Intended Use
+This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:
+- OCR pipelines on Tibetan documents
+- Document digitization workflows
+- Structured text extraction from scanned Tibetan texts
+- Digital library cataloging and indexing
+## Limitations
+- Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
+- Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values.
+- The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.
+## License
+This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.
+## Acknowledgements
+This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.
+## Citation
+If you use this model, please cite the dataset:
+```bibtex
+@dataset{bdrc_tdla_2025,
+  title   = {TDLA Training Dataset},
+  author  = {Buddhist Digital Resource Center (BDRC)},
+  year    = {2025},
+  url     = {https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset},
+  license = {CC0-1.0}
+}
+```