--- library_name: ultralytics task: object-detection tags: - yolo - yolo26 - tibetan - document-layout-analysis - object-detection - bounding-box - BDRC language: - bo license: cc0-1.0 datasets: - BDRC/TDLA-Training-Dataset metrics: - mAP50 - mAP50-95 - precision - recall model-index: - name: TDLA-YOLO26m results: - task: type: object-detection name: Object Detection dataset: type: BDRC/TDLA-Training-Dataset name: TDLA Training Dataset split: val metrics: - type: mAP50 value: 0.982 name: mAP@0.5 - type: mAP50-95 value: 0.799 name: mAP@0.5:0.95 - type: precision value: 0.966 name: Precision - type: recall value: 0.970 name: Recall --- # TMBLD-YOLO26m — Tibetan Modern book layout dection A fine-tuned **YOLO26m** object-detection model for **Tibetan Modern book layout dection**. The model detects four layout classes in Tibetan modern book page images: **header**, **Text area**, **footnote**, and **footer**. ## Model Description This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library. | Property | Value | | --- | --- | | **Architecture** | YOLO26m | | **Task** | Object Detection | | **Image size** | 640 × 640 | | **Number of classes** | 4 | | **Training platform** | Ultralytics HUB | | **Weights file** | `Tibetan_modern_book_Layout_detection.pt` | ## Classes | ID | Class | Description | | --- | --- | --- | | 0 | header | Page header region | | 1 | Text area | Main body text region | | 2 | footnote | Footnote region | | 3 | footer | Page footer region | ## Performance Evaluated on the validation split of the TDLA Training Dataset. | Metric | Value | | --- | --- | | **Precision** | 0.966 | | **Recall** | 0.970 | | **mAP@0.5** | 0.982 | | **mAP@0.5:0.95** | 0.799 | ### Training Loss (final epoch) | Loss Component | Train | Val | | --- | --- | --- | | Box loss | 0.515 | 0.643 | | Classification loss | 0.218 | 0.276 | | DFL loss | 0.003 | 0.004 | ## Training Details ### Dataset - **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset) - **Train images:** 2,692 - **Val images:** 103 - **Test images:** 313 - **Total annotations:** 14,705 - **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio) ### Hyperparameters | Parameter | Value | | --- | --- | | Epochs | 150 | | Patience | 100 | | Batch size | Auto (-1) | | Image size | 640 | | Optimizer | Auto (SGD) | | Initial learning rate (lr0) | 0.01 | | Final learning rate factor (lrf) | 0.01 | | Momentum | 0.937 | | Weight decay | 0.0005 | | Warmup epochs | 3.0 | | Warmup momentum | 0.8 | | Warmup bias lr | 0.1 | | AMP (mixed precision) | True | | Pretrained | True | | Deterministic | True | | Seed | 0 | ### Loss Weights | Component | Weight | | --- | --- | | Box | 7.5 | | Classification | 0.5 | | DFL | 1.5 | ### Augmentation | Augmentation | Value | | --- | --- | | HSV-Hue | 0.015 | | HSV-Saturation | 0.7 | | HSV-Value | 0.4 | | Translation | 0.1 | | Scale | 0.5 | | Flip left-right | 0.5 | | Mosaic | 1.0 | | Erasing | 0.4 | | Close mosaic (last N epochs) | 10 | | Auto augment | RandAugment | ## Usage ### Inference with Ultralytics ```python from ultralytics import YOLO model = YOLO("Tibetan_modern_book_Layout_detection.pt") results = model.predict("page_image.jpg", imgsz=640) for result in results: boxes = result.boxes for box in boxes: cls_id = int(box.cls) conf = float(box.conf) xyxy = box.xyxy[0].tolist() print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}") ``` ### Batch Inference ```python from ultralytics import YOLO model = YOLO("Tibetan_modern_book_Layout_detection.pt") results = model.predict("path/to/images/", imgsz=640, conf=0.25) ``` ## Intended Use This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for: - OCR pipelines on Tibetan documents - Document digitization workflows - Structured text extraction from scanned Tibetan texts - Digital library cataloging and indexing ## Limitations - Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary. - Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values. - The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class. ## License This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission. ## Acknowledgements This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation. ## Citation If you use this model, please cite the dataset: ```bibtex @software{bdrc_tmbld_yolo26m_2026, title = {tmbld-YOLO26m: Tibetan Modern book layout detection Model}, author = {Buddhist Digital Resource Center (BDRC)}, year = {2026}, url = {https://huggingface.co/BDRC/TDLA-YOLO26m}, license = {CC0-1.0} } ```