Update README.md

146b1f7 verified 2 days ago

5.69 kB

	---
	library_name: ultralytics
	task: object-detection
	tags:
	- yolo
	- yolo26
	- tibetan
	- document-layout-analysis
	- object-detection
	- bounding-box
	- BDRC
	language:
	- bo
	license: cc0-1.0
	datasets:
	- BDRC/TDLA-Training-Dataset
	metrics:
	- mAP50
	- mAP50-95
	- precision
	- recall
	model-index:
	- name: TDLA-YOLO26m
	results:
	- task:
	type: object-detection
	name: Object Detection
	dataset:
	type: BDRC/TDLA-Training-Dataset
	name: TDLA Training Dataset
	split: val
	metrics:
	- type: mAP50
	value: 0.982
	name: mAP@0.5
	- type: mAP50-95
	value: 0.799
	name: mAP@0.5:0.95
	- type: precision
	value: 0.966
	name: Precision
	- type: recall
	value: 0.970
	name: Recall
	---

	# TMBLD-YOLO26m — Tibetan Modern book layout dection

	A fine-tuned YOLO26m object-detection model for Tibetan Modern book layout dection. The model detects four layout classes in Tibetan modern book page images: header, Text area, footnote, and footer.

	## Model Description

	This model was fine-tuned from the Ultralytics YOLO26m pretrained checkpoint on the [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset), a YOLO-format bounding-box dataset of Tibetan document pages sourced from the Buddhist Digital Resource Center (BDRC) digital library.

	\| Property \| Value \|
	\| --- \| --- \|
	\| Architecture \| YOLO26m \|
	\| Task \| Object Detection \|
	\| Image size \| 640 × 640 \|
	\| Number of classes \| 4 \|
	\| Training platform \| Ultralytics HUB \|
	\| Weights file \| `Tibetan_modern_book_Layout_detection.pt` \|

	## Classes

	\| ID \| Class \| Description \|
	\| --- \| --- \| --- \|
	\| 0 \| header \| Page header region \|
	\| 1 \| Text area \| Main body text region \|
	\| 2 \| footnote \| Footnote region \|
	\| 3 \| footer \| Page footer region \|

	## Performance

	Evaluated on the validation split of the TDLA Training Dataset.

	\| Metric \| Value \|
	\| --- \| --- \|
	\| Precision \| 0.966 \|
	\| Recall \| 0.970 \|
	\| mAP@0.5 \| 0.982 \|
	\| mAP@0.5:0.95 \| 0.799 \|

	### Training Loss (final epoch)

	\| Loss Component \| Train \| Val \|
	\| --- \| --- \| --- \|
	\| Box loss \| 0.515 \| 0.643 \|
	\| Classification loss \| 0.218 \| 0.276 \|
	\| DFL loss \| 0.003 \| 0.004 \|

	## Training Details

	### Dataset

	- Dataset: [BDRC/TDLA-Training-Dataset](https://huggingface.co/datasets/BDRC/TDLA-Training-Dataset)
	- Train images: 2,692
	- Val images: 103
	- Test images: 313
	- Total annotations: 14,705
	- Train/Val split: Iterative multi-label stratification (seed 42, 80/20 ratio)

	### Hyperparameters

	\| Parameter \| Value \|
	\| --- \| --- \|
	\| Epochs \| 150 \|
	\| Patience \| 100 \|
	\| Batch size \| Auto (-1) \|
	\| Image size \| 640 \|
	\| Optimizer \| Auto (SGD) \|
	\| Initial learning rate (lr0) \| 0.01 \|
	\| Final learning rate factor (lrf) \| 0.01 \|
	\| Momentum \| 0.937 \|
	\| Weight decay \| 0.0005 \|
	\| Warmup epochs \| 3.0 \|
	\| Warmup momentum \| 0.8 \|
	\| Warmup bias lr \| 0.1 \|
	\| AMP (mixed precision) \| True \|
	\| Pretrained \| True \|
	\| Deterministic \| True \|
	\| Seed \| 0 \|

	### Loss Weights

	\| Component \| Weight \|
	\| --- \| --- \|
	\| Box \| 7.5 \|
	\| Classification \| 0.5 \|
	\| DFL \| 1.5 \|

	### Augmentation

	\| Augmentation \| Value \|
	\| --- \| --- \|
	\| HSV-Hue \| 0.015 \|
	\| HSV-Saturation \| 0.7 \|
	\| HSV-Value \| 0.4 \|
	\| Translation \| 0.1 \|
	\| Scale \| 0.5 \|
	\| Flip left-right \| 0.5 \|
	\| Mosaic \| 1.0 \|
	\| Erasing \| 0.4 \|
	\| Close mosaic (last N epochs) \| 10 \|
	\| Auto augment \| RandAugment \|

	## Usage

	### Inference with Ultralytics

	```python
	from ultralytics import YOLO

	model = YOLO("Tibetan_modern_book_Layout_detection.pt")

	results = model.predict("page_image.jpg", imgsz=640)

	for result in results:
	boxes = result.boxes
	for box in boxes:
	cls_id = int(box.cls)
	conf = float(box.conf)
	xyxy = box.xyxy[0].tolist()
	print(f"Class: {cls_id}, Confidence: {conf:.3f}, Box: {xyxy}")
	```

	### Batch Inference

	```python
	from ultralytics import YOLO

	model = YOLO("Tibetan_modern_book_Layout_detection.pt")

	results = model.predict("path/to/images/", imgsz=640, conf=0.25)
	```

	## Intended Use

	This model is designed for automatic layout detection of modern Tibetan book pages. It can be used as a preprocessing step for:

	- OCR pipelines on Tibetan documents
	- Document digitization workflows
	- Structured text extraction from scanned Tibetan texts
	- Digital library cataloging and indexing

	## Limitations

	- Trained primarily on modern Tibetan book layouts; performance on historical manuscripts, woodblock prints, or non-standard layouts may vary.
	- Optimized for 640×640 input resolution; very high-resolution pages may benefit from tiling or higher `imgsz` values.
	- The footnote class has fewer training samples (456 annotations) compared to other classes, which may affect detection quality for that class.

	## License

	This model is released under the CC0 1.0 Universal (Public Domain Dedication). You are free to copy, modify, and distribute the model, even for commercial purposes, without asking permission.

	## Acknowledgements

	This dataset was developed by Dharmaduta from specifications provided by the Buddhist Digital Resource Center (BDRC) for the BDRC Etext Corpus, with funding from the Khyentse Foundation.

	## Citation

	If you use this model, please cite the dataset:

	```bibtex
	@software{bdrc_tmbld_yolo26m_2026,
	title = {tmbld-YOLO26m: Tibetan Modern book layout detection Model},
	author = {Buddhist Digital Resource Center (BDRC)},
	year = {2026},
	url = {https://huggingface.co/BDRC/TDLA-YOLO26m},
	license = {CC0-1.0}
	}
	```