| | --- |
| | license: gpl-3.0 |
| | base_model: |
| | - naver-clova-ix/donut-base |
| | pipeline_tag: visual-document-retrieval |
| | --- |
| | |
| | # HeR-T: Herbarium specimen label Recognition Transformer |
| |
|
| | ## π Paper |
| | Application of computer vision to the automated extraction of metadata from natural history specimen labels: A case study on herbarium specimens (Under Review) |
| |
|
| | ## π Authors |
| | Zacchigna, Jacopo; Liu, Weiwei; Pellegrino, Felice Andrea; Peron, Adriano; Roma-Marzio, Francesco; Peruzzi, Lorenzo; Martellos, Stefano |
| |
|
| | ## π Overview |
| | HeR-T (Herbarium specimen label Recognition Transformer) is a fine-tuned vision-language model designed for automated metadata extraction of history specimen labels, especially herbarium specimen labels. It leverages Donut-base and has been fine-tuned with 55,089 herbarium specimen images from the Herbarium of the University of Pisa (international acronym PI). |
| |
|
| | ## π₯ Features |
| | - **Fine-tuned on** specimen images from the Herbarium of the University of Pisa for automated metadata extraction of history specimen labels |
| | - **Supports** image inputs with labels containing printed, handwritten, or mixed-format texts |
| | - **Evaluation**: Tree Edit Distance (TED) accuracy score with the formula max(0, 1βTED(pr, gt)/TED(Ο, gt)), where gt, pr, and Ο stand for ground truth, prediction, and empty trees respectively |
| | - **Pre-trained weights** are loaded from Donut-base (naver-clova-ix/donut-base) |