File size: 5,690 Bytes
b8f9e27
24dacee
 
 
 
 
 
 
 
 
 
 
 
b8f9e27
24dacee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8f9e27
24dacee
9c08ccf
24dacee
9c08ccf
24dacee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146b1f7
 
24dacee
146b1f7
 
24dacee
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
library_name: ultralytics
task: object-detection
tags:
  - yolo
  - yolo26
  - tibetan
  - document-layout-analysis
  - object-detection
  - bounding-box
  - BDRC
language:
  - bo
license: cc0-1.0
datasets:
  - BDRC/TDLA-Training-Dataset
metrics:
  - mAP50
  - mAP50-95
  - precision
  - recall
model-index:
  - name: TDLA-YOLO26m
    results:
      - task:
          type: object-detection
          name: Object Detection
        dataset:
          type: BDRC/TDLA-Training-Dataset
          name: TDLA Training Dataset
          split: val
        metrics:
          - type: mAP50
            value: 0.982
            name: mAP@0.5
          - type: mAP50-95
            value: 0.799
            name: mAP@0.5:0.95
          - type: precision
            value: 0.966
            name: Precision
          - type: recall
            value: 0.970
            name: Recall
---

# TMBLD-YOLO26m — Tibetan Modern book layout dection

A fine-tuned **YOLO26m** object-detection model for **Tibetan Modern book layout dection**. The model detects four layout classes in Tibetan modern book page images: **header**, **Text area**, **footnote**, and **footer**.

## Model Description

This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.

| Property | Value |
| --- | --- |
| **Architecture** | YOLO26m |
| **Task** | Object Detection |
| **Image size** | 640 × 640 |
| **Number of classes** | 4 |
| **Training platform** | Ultralytics HUB |
| **Weights file** | `Tibetan_modern_book_Layout_detection.pt` |

## Classes

| ID | Class | Description |
| --- | --- | --- |
| 0 | header | Page header region |
| 1 | Text area | Main body text region |
| 2 | footnote | Footnote region |
| 3 | footer | Page footer region |

## Performance

Evaluated on the validation split of the TDLA Training Dataset.

| Metric | Value |
| --- | --- |
| **Precision** | 0.966 |
| **Recall** | 0.970 |
| **mAP@0.5** | 0.982 |
| **mAP@0.5:0.95** | 0.799 |

### Training Loss (final epoch)

| Loss Component | Train | Val |
| --- | --- | --- |
| Box loss | 0.515 | 0.643 |
| Classification loss | 0.218 | 0.276 |
| DFL loss | 0.003 | 0.004 |

## Training Details

### Dataset

- **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset)
- **Train images:** 2,692
- **Val images:** 103
- **Test images:** 313
- **Total annotations:** 14,705
- **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio)

### Hyperparameters

| Parameter | Value |
| --- | --- |
| Epochs | 150 |
| Patience | 100 |
| Batch size | Auto (-1) |
| Image size | 640 |
| Optimizer | Auto (SGD) |
| Initial learning rate (lr0) | 0.01 |
| Final learning rate factor (lrf) | 0.01 |
| Momentum | 0.937 |
| Weight decay | 0.0005 |
| Warmup epochs | 3.0 |
| Warmup momentum | 0.8 |
| Warmup bias lr | 0.1 |
| AMP (mixed precision) | True |
| Pretrained | True |
| Deterministic | True |
| Seed | 0 |

### Loss Weights

| Component | Weight |
| --- | --- |
| Box | 7.5 |
| Classification | 0.5 |
| DFL | 1.5 |

### Augmentation

| Augmentation | Value |
| --- | --- |
| HSV-Hue | 0.015 |
| HSV-Saturation | 0.7 |
| HSV-Value | 0.4 |
| Translation | 0.1 |
| Scale | 0.5 |
| Flip left-right | 0.5 |
| Mosaic | 1.0 |
| Erasing | 0.4 |
| Close mosaic (last N epochs) | 10 |
| Auto augment | RandAugment |

## Usage

### Inference with Ultralytics

```python
from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("page_image.jpg", imgsz=640)

for result in results:
    boxes = result.boxes
    for box in boxes:
        cls_id = int(box.cls)
        conf = float(box.conf)
        xyxy = box.xyxy[0].tolist()
        print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")
```

### Batch Inference

```python
from ultralytics import YOLO

model = YOLO("Tibetan_modern_book_Layout_detection.pt")

results = model.predict("path/to/images/", imgsz=640, conf=0.25)
```

## Intended Use

This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:

- OCR pipelines on Tibetan documents
- Document digitization workflows
- Structured text extraction from scanned Tibetan texts
- Digital library cataloging and indexing

## Limitations

- Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
- Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values.
- The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.

## License

This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.

## Acknowledgements

This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.

## Citation

If you use this model, please cite the dataset:

```bibtex
@software{bdrc_tmbld_yolo26m_2026,
  title   = {tmbld-YOLO26m: Tibetan Modern book layout detection Model},
  author  = {Buddhist Digital Resource Center (BDRC)},
  year    = {2026},
  url     = {https://huggingface.co/BDRC/TDLA-YOLO26m},
  license = {CC0-1.0}
}
```