File size: 5,690 Bytes

---
library_name: ultralytics
task: object-detection
tags:
  - yolo
  - yolo26
  - tibetan
  - document-layout-analysis
  - object-detection
  - bounding-box
  - BDRC
language:
  - bo
license: cc0-1.0
datasets:
  - BDRC/TDLA-Training-Dataset
metrics:
  - mAP50
  - mAP50-95
  - precision
  - recall
model-index:
  - name: TDLA-YOLO26m
    results:
      - task:
          type: object-detection
          name: Object Detection
        dataset:
          type: BDRC/TDLA-Training-Dataset
          name: TDLA Training Dataset
          split: val
        metrics:
          - type: mAP50
            value: 0.982
            name: mAP@0.5
          - type: mAP50-95
            value: 0.799
            name: mAP@0.5:0.95
          - type: precision
            value: 0.966
            name: Precision
          - type: recall
            value: 0.970
            name: Recall
---

# TMBLD-YOLO26m — Tibetan Modern book layout dection

A fine-tuned **YOLO26m** object-detection model for **Tibetan Modern book layout dection**. The model detects four layout classes in Tibetan modern book page images: **header**, **Text area**, **footnote**, and **footer**.

## Model Description

This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.

| Property | Value |
| --- | --- |
| **Architecture** | YOLO26m |
| **Task** | Object Detection |
| **Image size** | 640 × 640 |
| **Number of classes** | 4 |
| **Training platform** | Ultralytics HUB |
| **Weights file** | `Tibetan_modern_book_Layout_detection.pt` |

## Classes

| ID | Class | Description |
| --- | --- | --- |
| 0 | header | Page header region |
| 1 | Text area | Main body text region |
| 2 | footnote | Footnote region |
| 3 | footer | Page footer region |

## Performance

Evaluated on the validation split of the TDLA Training Dataset.

| Metric | Value |
| --- | --- |
| **Precision** | 0.966 |
| **Recall** | 0.970 |
| **mAP@0.5** | 0.982 |
| **mAP@0.5:0.95** | 0.799 |

### Training Loss (final epoch)

| Loss Component | Train | Val |
| --- | --- | --- |
| Box loss | 0.515 | 0.643 |
| Classification loss | 0.218 | 0.276 |
| DFL loss | 0.003 | 0.004 |

## Training Details

### Dataset

- **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset)
- **Train images:** 2,692
- **Val images:** 103
- **Test images:** 313
- **Total annotations:** 14,705
- **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio)

### Hyperparameters

| Parameter | Value |
| --- | --- |
| Epochs | 150 |
| Patience | 100 |
| Batch size | Auto (-1) |
| Image size | 640 |
| Optimizer | Auto (SGD) |
| Initial learning rate (lr0) | 0.01 |
| Final learning rate factor (lrf) | 0.01 |
| Momentum | 0.937 |
| Weight decay | 0.0005 |
| Warmup epochs | 3.0 |
| Warmup momentum | 0.8 |
| Warmup bias lr | 0.1 |
| AMP (mixed precision) | True |
| Pretrained | True |
| Deterministic | True |
| Seed | 0 |

### Loss Weights

| Component | Weight |
| --- | --- |
| Box | 7.5 |
| Classification | 0.5 |
| DFL | 1.5 |

### Augmentation

| Augmentation | Value |
| --- | --- |
| HSV-Hue | 0.015 |
| HSV-Saturation | 0.7 |
| HSV-Value | 0.4 |
| Translation | 0.1 |
| Scale | 0.5 |
| Flip left-right | 0.5 |
| Mosaic | 1.0 |
| Erasing | 0.4 |
| Close mosaic (last N epochs) | 10 |
| Auto augment | RandAugment |

## Usage

### Inference with Ultralytics

```python
from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("page_image.jpg", imgsz=640)

for result in results:
    boxes = result.boxes
    for box in boxes:
        cls_id = int(box.cls)
        conf = float(box.conf)
        xyxy = box.xyxy[0].tolist()
        print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")
```

### Batch Inference

```python
from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("path/to/images/", imgsz=640, conf=0.25)
```

## Intended Use

This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:

- OCR pipelines on Tibetan documents
- Document digitization workflows
- Structured text extraction from scanned Tibetan texts
- Digital library cataloging and indexing

## Limitations

- Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
- Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values.
- The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.

## License

This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.

## Acknowledgements

This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.

## Citation

If you use this model, please cite the dataset:

```bibtex
@software{bdrc_tmbld_yolo26m_2026,
  title   = {tmbld-YOLO26m: Tibetan Modern book layout detection Model},
  author  = {Buddhist Digital Resource Center (BDRC)},
  year    = {2026},
  url     = {https://huggingface.co/BDRC/TDLA-YOLO26m},
  license = {CC0-1.0}
}
```