kaldan's picture
Update README.md
146b1f7 verified
---
library_name: ultralytics
task: object-detection
tags:
- yolo
- yolo26
- tibetan
- document-layout-analysis
- object-detection
- bounding-box
- BDRC
language:
- bo
license: cc0-1.0
datasets:
- BDRC/TDLA-Training-Dataset
metrics:
- mAP50
- mAP50-95
- precision
- recall
model-index:
- name: TDLA-YOLO26m
results:
- task:
type: object-detection
name: Object Detection
dataset:
type: BDRC/TDLA-Training-Dataset
name: TDLA Training Dataset
split: val
metrics:
- type: mAP50
value: 0.982
name: mAP@0.5
- type: mAP50-95
value: 0.799
name: mAP@0.5:0.95
- type: precision
value: 0.966
name: Precision
- type: recall
value: 0.970
name: Recall
---
# TMBLD-YOLO26m — Tibetan Modern book layout dection
A fine-tuned **YOLO26m** object-detection model for **Tibetan Modern book layout dection**. The model detects four layout classes in Tibetan modern book page images: **header**, **Text area**, **footnote**, and **footer**.
## Model Description
This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.
| Property | Value |
| --- | --- |
| **Architecture** | YOLO26m |
| **Task** | Object Detection |
| **Image size** | 640 × 640 |
| **Number of classes** | 4 |
| **Training platform** | Ultralytics HUB |
| **Weights file** | `Tibetan_modern_book_Layout_detection.pt` |
## Classes
| ID | Class | Description |
| --- | --- | --- |
| 0 | header | Page header region |
| 1 | Text area | Main body text region |
| 2 | footnote | Footnote region |
| 3 | footer | Page footer region |
## Performance
Evaluated on the validation split of the TDLA Training Dataset.
| Metric | Value |
| --- | --- |
| **Precision** | 0.966 |
| **Recall** | 0.970 |
| **mAP@0.5** | 0.982 |
| **mAP@0.5:0.95** | 0.799 |
### Training Loss (final epoch)
| Loss Component | Train | Val |
| --- | --- | --- |
| Box loss | 0.515 | 0.643 |
| Classification loss | 0.218 | 0.276 |
| DFL loss | 0.003 | 0.004 |
## Training Details
### Dataset
- **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset)
- **Train images:** 2,692
- **Val images:** 103
- **Test images:** 313
- **Total annotations:** 14,705
- **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio)
### Hyperparameters
| Parameter | Value |
| --- | --- |
| Epochs | 150 |
| Patience | 100 |
| Batch size | Auto (-1) |
| Image size | 640 |
| Optimizer | Auto (SGD) |
| Initial learning rate (lr0) | 0.01 |
| Final learning rate factor (lrf) | 0.01 |
| Momentum | 0.937 |
| Weight decay | 0.0005 |
| Warmup epochs | 3.0 |
| Warmup momentum | 0.8 |
| Warmup bias lr | 0.1 |
| AMP (mixed precision) | True |
| Pretrained | True |
| Deterministic | True |
| Seed | 0 |
### Loss Weights
| Component | Weight |
| --- | --- |
| Box | 7.5 |
| Classification | 0.5 |
| DFL | 1.5 |
### Augmentation
| Augmentation | Value |
| --- | --- |
| HSV-Hue | 0.015 |
| HSV-Saturation | 0.7 |
| HSV-Value | 0.4 |
| Translation | 0.1 |
| Scale | 0.5 |
| Flip left-right | 0.5 |
| Mosaic | 1.0 |
| Erasing | 0.4 |
| Close mosaic (last N epochs) | 10 |
| Auto augment | RandAugment |
## Usage
### Inference with Ultralytics
```python
from ultralytics import YOLO
model = YOLO("Tibetan_modern_book_Layout_detection.pt")
results = model.predict("page_image.jpg", imgsz=640)
for result in results:
boxes = result.boxes
for box in boxes:
cls_id = int(box.cls)
conf = float(box.conf)
xyxy = box.xyxy[0].tolist()
print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")
```
### Batch Inference
```python
from ultralytics import YOLO
model = YOLO("Tibetan_modern_book_Layout_detection.pt")
results = model.predict("path/to/images/", imgsz=640, conf=0.25)
```
## Intended Use
This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:
- OCR pipelines on Tibetan documents
- Document digitization workflows
- Structured text extraction from scanned Tibetan texts
- Digital library cataloging and indexing
## Limitations
- Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
- Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values.
- The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.
## License
This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.
## Acknowledgements
This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.
## Citation
If you use this model, please cite the dataset:
```bibtex
@software{bdrc_tmbld_yolo26m_2026,
title = {tmbld-YOLO26m: Tibetan Modern book layout detection Model},
author = {Buddhist Digital Resource Center (BDRC)},
year = {2026},
url = {https://huggingface.co/BDRC/TDLA-YOLO26m},
license = {CC0-1.0}
}
```