docling-project
/

docling-layout-heron

Model card Files Files and versions

docling-layout-heron / README.md

nlivathinos's picture

Update README.md

2fdffa3 verified 10 months ago

|

2.74 kB

	---
	license: apache-2.0
	---

	THIS IS WORK IN PROGRESS


	# Docling Layout Model

	`docling-layout-heron` is the Layout Model of [Docling project](https://github.com/docling-project/docling).

	This model uses the [RT-DETRv2](https://github.com/lyuwenyu/RT-DETR/tree/main/rtdetrv2_pytorch) architecture and has been trained from scratch on a variety of document datasets.


	# Inference code example

	Prerequisites:

	```bash
	pip install transformers Pillow torch requests
	```

	Prediction:

	```python
	import requests
	from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
	import torch
	from PIL import Image


	classes_map = {
	0: "Caption",
	1: "Footnote",
	2: "Formula",
	3: "List-item",
	4: "Page-footer",
	5: "Page-header",
	6: "Picture",
	7: "Section-header",
	8: "Table",
	9: "Text",
	10: "Title",
	11: "Document Index",
	12: "Code",
	13: "Checkbox-Selected",
	14: "Checkbox-Unselected",
	15: "Form",
	16: "Key-Value Region",
	}
	image_url = "https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo/resolve/main/example_images/annual_rep_14.png"
	model_name = "ds4sd/docling-layout-heron"
	threshold = 0.6


	# Download the image
	image = Image.open(requests.get(image_url, stream=True).raw)
	image = image.convert("RGB")

	# Initialize the model
	image_processor = RTDetrImageProcessor.from_pretrained(model_name)
	model = RTDetrV2ForObjectDetection.from_pretrained(model_name)

	# Run the prediction pipeline
	inputs = image_processor(images=[image], return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)
	results = image_processor.post_process_object_detection(
	outputs,
	target_sizes=torch.tensor([image.size[::-1]]),
	threshold=threshold,
	)

	# Get the results
	for result in results:
	for score, label_id, box in zip(
	result["scores"], result["labels"], result["boxes"]
	):
	score = round(score.item(), 2)
	label = classes_map[label_id.item()]
	box = [round(i, 2) for i in box.tolist()]
	print(f"{label}:{score} {box}")
	```


	# References

	```
	@techreport{Docling,
	author = {Deep Search Team},
	month = {8},
	title = {Docling Technical Report},
	url = {https://arxiv.org/abs/2408.09869v4},
	eprint = {2408.09869},
	doi = {10.48550/arXiv.2408.09869},
	version = {1.0.0},
	year = {2024}
	}

	@misc{lv2024rtdetrv2improvedbaselinebagoffreebies,
	title={RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer},
	author={Wenyu Lv and Yian Zhao and Qinyao Chang and Kui Huang and Guanzhong Wang and Yi Liu},
	year={2024},
	eprint={2407.17140},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2407.17140},
	}

	```