kaldan commited on
Commit
24dacee
·
verified ·
1 Parent(s): f0700bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +209 -0
README.md CHANGED
@@ -1,3 +1,212 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc0-1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: ultralytics
3
+ task: object-detection
4
+ tags:
5
+ - yolo
6
+ - yolo26
7
+ - tibetan
8
+ - document-layout-analysis
9
+ - object-detection
10
+ - bounding-box
11
+ - BDRC
12
+ language:
13
+ - bo
14
  license: cc0-1.0
15
+ datasets:
16
+ - BDRC/TDLA-Training-Dataset
17
+ metrics:
18
+ - mAP50
19
+ - mAP50-95
20
+ - precision
21
+ - recall
22
+ model-index:
23
+ - name: TDLA-YOLO26m
24
+ results:
25
+ - task:
26
+ type: object-detection
27
+ name: Object Detection
28
+ dataset:
29
+ type: BDRC/TDLA-Training-Dataset
30
+ name: TDLA Training Dataset
31
+ split: val
32
+ metrics:
33
+ - type: mAP50
34
+ value: 0.982
35
+ name: mAP@0.5
36
+ - type: mAP50-95
37
+ value: 0.799
38
+ name: mAP@0.5:0.95
39
+ - type: precision
40
+ value: 0.966
41
+ name: Precision
42
+ - type: recall
43
+ value: 0.970
44
+ name: Recall
45
  ---
46
+
47
+ # TDLA-YOLO26m — Tibetan Document Layout Analysis
48
+
49
+ A fine-tuned **YOLO26m** object-detection model for **Tibetan Document Layout Analysis (TDLA)**. The model detects four layout classes in Tibetan document page images: **header**, **Text area**, **footnote**, and **footer**.
50
+
51
+ ## Model Description
52
+
53
+ This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.
54
+
55
+ | Property | Value |
56
+ | --- | --- |
57
+ | **Architecture** | YOLO26m |
58
+ | **Task** | Object Detection |
59
+ | **Image size** | 640 × 640 |
60
+ | **Number of classes** | 4 |
61
+ | **Training platform** | Ultralytics HUB |
62
+ | **Weights file** | `Tibetan_modern_book_Layout_detection.pt` |
63
+
64
+ ## Classes
65
+
66
+ | ID | Class | Description |
67
+ | --- | --- | --- |
68
+ | 0 | header | Page header region |
69
+ | 1 | Text area | Main body text region |
70
+ | 2 | footnote | Footnote region |
71
+ | 3 | footer | Page footer region |
72
+
73
+ ## Performance
74
+
75
+ Evaluated on the validation split of the TDLA Training Dataset.
76
+
77
+ | Metric | Value |
78
+ | --- | --- |
79
+ | **Precision** | 0.966 |
80
+ | **Recall** | 0.970 |
81
+ | **mAP@0.5** | 0.982 |
82
+ | **mAP@0.5:0.95** | 0.799 |
83
+
84
+ ### Training Loss (final epoch)
85
+
86
+ | Loss Component | Train | Val |
87
+ | --- | --- | --- |
88
+ | Box loss | 0.515 | 0.643 |
89
+ | Classification loss | 0.218 | 0.276 |
90
+ | DFL loss | 0.003 | 0.004 |
91
+
92
+ ## Training Details
93
+
94
+ ### Dataset
95
+
96
+ - **Dataset:** [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset)
97
+ - **Train images:** 2,692
98
+ - **Val images:** 103
99
+ - **Test images:** 313
100
+ - **Total annotations:** 14,705
101
+ - **Train/Val split:** Iterative multi-label stratification (seed 42, 80/20 ratio)
102
+
103
+ ### Hyperparameters
104
+
105
+ | Parameter | Value |
106
+ | --- | --- |
107
+ | Epochs | 150 |
108
+ | Patience | 100 |
109
+ | Batch size | Auto (-1) |
110
+ | Image size | 640 |
111
+ | Optimizer | Auto (SGD) |
112
+ | Initial learning rate (lr0) | 0.01 |
113
+ | Final learning rate factor (lrf) | 0.01 |
114
+ | Momentum | 0.937 |
115
+ | Weight decay | 0.0005 |
116
+ | Warmup epochs | 3.0 |
117
+ | Warmup momentum | 0.8 |
118
+ | Warmup bias lr | 0.1 |
119
+ | AMP (mixed precision) | True |
120
+ | Pretrained | True |
121
+ | Deterministic | True |
122
+ | Seed | 0 |
123
+
124
+ ### Loss Weights
125
+
126
+ | Component | Weight |
127
+ | --- | --- |
128
+ | Box | 7.5 |
129
+ | Classification | 0.5 |
130
+ | DFL | 1.5 |
131
+
132
+ ### Augmentation
133
+
134
+ | Augmentation | Value |
135
+ | --- | --- |
136
+ | HSV-Hue | 0.015 |
137
+ | HSV-Saturation | 0.7 |
138
+ | HSV-Value | 0.4 |
139
+ | Translation | 0.1 |
140
+ | Scale | 0.5 |
141
+ | Flip left-right | 0.5 |
142
+ | Mosaic | 1.0 |
143
+ | Erasing | 0.4 |
144
+ | Close mosaic (last N epochs) | 10 |
145
+ | Auto augment | RandAugment |
146
+
147
+ ## Usage
148
+
149
+ ### Inference with Ultralytics
150
+
151
+ ```python
152
+ from ultralytics import YOLO
153
+
154
+ model = YOLO("Tibetan_modern_book_Layout_detection.pt")
155
+
156
+ results = model.predict("page_image.jpg", imgsz=640)
157
+
158
+ for result in results:
159
+ boxes = result.boxes
160
+ for box in boxes:
161
+ cls_id = int(box.cls)
162
+ conf = float(box.conf)
163
+ xyxy = box.xyxy[0].tolist()
164
+ print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")
165
+ ```
166
+
167
+ ### Batch Inference
168
+
169
+ ```python
170
+ from ultralytics import YOLO
171
+
172
+ model = YOLO("Tibetan_modern_book_Layout_detection.pt")
173
+
174
+ results = model.predict("path/to/images/", imgsz=640, conf=0.25)
175
+ ```
176
+
177
+ ## Intended Use
178
+
179
+ This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:
180
+
181
+ - OCR pipelines on Tibetan documents
182
+ - Document digitization workflows
183
+ - Structured text extraction from scanned Tibetan texts
184
+ - Digital library cataloging and indexing
185
+
186
+ ## Limitations
187
+
188
+ - Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
189
+ - Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values.
190
+ - The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.
191
+
192
+ ## License
193
+
194
+ This model is released under the **CC0 1.0 Universal (Public Domain Dedication)**. You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.
195
+
196
+ ## Acknowledgements
197
+
198
+ This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.
199
+
200
+ ## Citation
201
+
202
+ If you use this model, please cite the dataset:
203
+
204
+ ```bibtex
205
+ @dataset{bdrc_tdla_2025,
206
+ title = {TDLA Training Dataset},
207
+ author = {Buddhist Digital Resource Center (BDRC)},
208
+ year = {2025},
209
+ url = {https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset},
210
+ license = {CC0-1.0}
211
+ }
212
+ ```