Object Detection
ultralytics
Tibetan
yolo
yolo26
tibetan
document-layout-analysis
bounding-box
BDRC
Eval Results (legacy)
Instructions to use BDRC/Tibetan_Modern_Book_Layout_Detection_Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- ultralytics
How to use BDRC/Tibetan_Modern_Book_Layout_Detection_Model with ultralytics:
# Couldn't find a valid YOLO version tag. # Replace XX with the correct version. from ultralytics import YOLOvXX model = YOLOvXX.from_pretrained("BDRC/Tibetan_Modern_Book_Layout_Detection_Model") source = 'http://images.cocodataset.org/val2017/000000039769.jpg' model.predict(source=source, save=True) - Notebooks
- Google Colab
- Kaggle
| library_name: ultralytics | |
| task: object-detection | |
| tags: | |
| - yolo | |
| - yolo26 | |
| - tibetan | |
| - document-layout-analysis | |
| - object-detection | |
| - bounding-box | |
| - BDRC | |
| language: | |
| - bo | |
| license: cc0-1.0 | |
| datasets: | |
| - BDRC/TDLA-Training-Dataset | |
| metrics: | |
| - mAP50 | |
| - mAP50-95 | |
| - precision | |
| - recall | |
| model-index: | |
| - name: TDLA-YOLO26m | |
| results: | |
| - task: | |
| type: object-detection | |
| name: Object Detection | |
| dataset: | |
| type: BDRC/TDLA-Training-Dataset | |
| name: TDLA Training Dataset | |
| split: val | |
| metrics: | |
| - type: mAP50 | |
| value: 0.982 | |
| name: mAP@0.5 | |
| - type: mAP50-95 | |
| value: 0.799 | |
| name: mAP@0.5:0.95 | |
| - type: precision | |
| value: 0.966 | |
| name: Precision | |
| - type: recall | |
| value: 0.970 | |
| name: Recall | |
| # TMBLD-YOLO26m — Tibetan Modern book layout dection | |
| A fine-tuned **YOLO26m** object-detection model for **Tibetan Modern book layout dection**. The model detects four layout classes in Tibetan modern book page images: **header**, **Text area**, **footnote**, and **footer**. | |
| ## Model Description | |
| This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library. | |
| | Property | Value | | |
| | --- | --- | | |
| | **Architecture** | YOLO26m | | |
| | **Task** | Object Detection | | |
| | **Image size** | 640 × 640 | | |
| | **Number of classes** | 4 | | |
| | **Training platform** | Ultralytics HUB | | |
| | **Weights file** | `Tibetan_modern_book_Layout_detection.pt` | | |
| ## Classes | |
| | ID | Class | Description | | |
| | --- | --- | --- | | |
| | 0 | header | Page header region | | |
| | 1 | Text area | Main body text region | | |
| | 2 | footnote | Footnote region | | |
| | 3 | footer | Page footer region | | |
| ## Performance | |
| Evaluated on the validation split of the TDLA Training Dataset. | |
| | Metric | Value | | |
| | --- | --- | | |
| | **Precision** | 0.966 | | |
| | **Recall** | 0.970 | | |
| | **mAP@0.5** | 0.982 | | |
| | **mAP@0.5:0.95** | 0.799 | | |
| ### Training Loss (final epoch) | |
| | Loss Component | Train | Val | | |
| | --- | --- | --- | | |
| | Box loss | 0.515 | 0.643 | | |
| | Classification loss | 0.218 | 0.276 | | |
| | DFL loss | 0.003 | 0.004 | | |
| ## Training Details | |
| ### Dataset | |
| - **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset) | |
| - **Train images:** 2,692 | |
| - **Val images:** 103 | |
| - **Test images:** 313 | |
| - **Total annotations:** 14,705 | |
| - **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio) | |
| ### Hyperparameters | |
| | Parameter | Value | | |
| | --- | --- | | |
| | Epochs | 150 | | |
| | Patience | 100 | | |
| | Batch size | Auto (-1) | | |
| | Image size | 640 | | |
| | Optimizer | Auto (SGD) | | |
| | Initial learning rate (lr0) | 0.01 | | |
| | Final learning rate factor (lrf) | 0.01 | | |
| | Momentum | 0.937 | | |
| | Weight decay | 0.0005 | | |
| | Warmup epochs | 3.0 | | |
| | Warmup momentum | 0.8 | | |
| | Warmup bias lr | 0.1 | | |
| | AMP (mixed precision) | True | | |
| | Pretrained | True | | |
| | Deterministic | True | | |
| | Seed | 0 | | |
| ### Loss Weights | |
| | Component | Weight | | |
| | --- | --- | | |
| | Box | 7.5 | | |
| | Classification | 0.5 | | |
| | DFL | 1.5 | | |
| ### Augmentation | |
| | Augmentation | Value | | |
| | --- | --- | | |
| | HSV-Hue | 0.015 | | |
| | HSV-Saturation | 0.7 | | |
| | HSV-Value | 0.4 | | |
| | Translation | 0.1 | | |
| | Scale | 0.5 | | |
| | Flip left-right | 0.5 | | |
| | Mosaic | 1.0 | | |
| | Erasing | 0.4 | | |
| | Close mosaic (last N epochs) | 10 | | |
| | Auto augment | RandAugment | | |
| ## Usage | |
| ### Inference with Ultralytics | |
| ```python | |
| from ultralytics import YOLO | |
| model = YOLO("Tibetan_modern_book_Layout_detection.pt") | |
| results = model.predict("page_image.jpg", imgsz=640) | |
| for result in results: | |
| boxes = result.boxes | |
| for box in boxes: | |
| cls_id = int(box.cls) | |
| conf = float(box.conf) | |
| xyxy = box.xyxy[0].tolist() | |
| print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}") | |
| ``` | |
| ### Batch Inference | |
| ```python | |
| from ultralytics import YOLO | |
| model = YOLO("Tibetan_modern_book_Layout_detection.pt") | |
| results = model.predict("path/to/images/", imgsz=640, conf=0.25) | |
| ``` | |
| ## Intended Use | |
| This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for: | |
| - OCR pipelines on Tibetan documents | |
| - Document digitization workflows | |
| - Structured text extraction from scanned Tibetan texts | |
| - Digital library cataloging and indexing | |
| ## Limitations | |
| - Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary. | |
| - Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values. | |
| - The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class. | |
| ## License | |
| This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission. | |
| ## Acknowledgements | |
| This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation. | |
| ## Citation | |
| If you use this model, please cite the dataset: | |
| ```bibtex | |
| @software{bdrc_tmbld_yolo26m_2026, | |
| title = {tmbld-YOLO26m: Tibetan Modern book layout detection Model}, | |
| author = {Buddhist Digital Resource Center (BDRC)}, | |
| year = {2026}, | |
| url = {https://huggingface.co/BDRC/TDLA-YOLO26m}, | |
| license = {CC0-1.0} | |
| } | |
| ``` |