elderprince
/

HeR-T

Visual Document Retrieval

vision-encoder-decoder

Model card Files Files and versions

HeR-T / README.md

elderprince's picture

Update README.md

4a953d7 verified 12 months ago

|

history blame contribute delete

1.42 kB

	---
	license: gpl-3.0
	base_model:
	- naver-clova-ix/donut-base
	pipeline_tag: visual-document-retrieval
	---

	# HeR-T: Herbarium specimen label Recognition Transformer

	## 📃 Paper
	Application of computer vision to the automated extraction of metadata from natural history specimen labels: A case study on herbarium specimens (Under Review)

	## 💁 Authors
	Zacchigna, Jacopo; Liu, Weiwei; Pellegrino, Felice Andrea; Peron, Adriano; Roma-Marzio, Francesco; Peruzzi, Lorenzo; Martellos, Stefano

	## 🚀 Overview
	HeR-T (Herbarium specimen label Recognition Transformer) is a fine-tuned vision-language model designed for automated metadata extraction of history specimen labels, especially herbarium specimen labels. It leverages Donut-base and has been fine-tuned with 55,089 herbarium specimen images from the Herbarium of the University of Pisa (international acronym PI).

	## 🔥 Features
	- Fine-tuned on specimen images from the Herbarium of the University of Pisa for automated metadata extraction of history specimen labels
	- Supports image inputs with labels containing printed, handwritten, or mixed-format texts
	- Evaluation: Tree Edit Distance (TED) accuracy score with the formula max(0, 1−TED(pr, gt)/TED(φ, gt)), where gt, pr, and φ stand for ground truth, prediction, and empty trees respectively
	- Pre-trained weights are loaded from Donut-base (naver-clova-ix/donut-base)