MaxViT-T model that was trained to classify images of 162 butterfly and moth species that occur in Austria.
Model Details
MaxVit-T pre-trained on ImageNet-1K was used and a full fine-tuning of the pre-trained MaxVit-T model, with all parameters rendered trainable, was conducted.
Model Description
- Developed by: Andreas Lindner, Friederike Barkmann
- Funded by:
- Viel-Falter Butterfly Monitoring which is financially supported by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK).
- EuroCC Austria which has received funding from the European High Performance Computing Joint Undertaking (JU) and Germany, Bulgaria, Austria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, France, Netherlands, Belgium, Luxembourg, Slovakia, Norway, Türkiye, Republic of North Macedonia, Iceland, Montenegro, Serbia under grant agreement No 101101903.
- License: MIT
- Finetuned from model: MaxViT-T pre-trained on ImageNet-1K
Uses
The model can be used to identify butterfly and moth species that occur in Austria. It can classify 162 species, 131 of which are butterflies.
Bias, Risks, and Limitations
The model does not cover all butterfly and moth species that occur in Austria. Of the about 210 butterflies, 131 were used for model training. Of the about 4000 moth species it were only 31. Not all butterfly and moth species can be determined based on images alone.
Training Details
The first model version (a9ba52f) was trained on the EuroHPC supercomputer LUMI, hosted by CSC (Finland) and the LUMI consortium through a EuroHPC Regular Access call. The second model version (2371ff8) was trained on the EuroHPC supercomputer LEONARDO, hosted by CINECA (Italy) and the LEONARDO consortium, also through a EuroHPC Regular Access call.
Training was parallelized using the Pytorch DDP framework and the Hugging Face Accelerate library. More information on model training can be found in the publications below. Scripts are availble on GitHub.
Training Data
The model was trained with a dataset of over 500,000 images of butterflies and moths that were recorded in Austria. The images were taken by users of the App "Schmetterlinge Österreichs" of the foundation "Blühendes Österreich" all over Austria. Images that showed more than one species or showed butterfly or moth eggs, larvae and pupae were excluded from training. Species with less than 50 images were excluded from training. The final dataset contains images of the adult life stages of 162 species (31 moth species and 131 butterfly species).
Citation
The first model version was trained in the context of a data paper in which the butterfly and moth images dataset it was trained on was published:
@Article{Barkmannetal2025a,
author={Barkmann, Friederike
and Lindner, Andreas
and W{\"u}rflinger, Ronald
and H{\"o}ttinger, Helmut
and R{\"u}disser, Johannes},
title={Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels},
journal={Scientific Data},
year={2025},
month={Aug},
day={06},
volume={12},
number={1},
pages={1369},
abstract={Deep learning models can accelerate the processing of image-based biodiversity data and provide educational value by giving direct feedback to citizen scientists. However, the training of such models requires large amounts of labelled data and not all species are equally suited for identification from images alone. Most butterfly and many moth species (Lepidoptera) which play an important role as biodiversity indicators are well-suited for such approaches. This dataset contains over 540.000 images of 185 butterfly and moth species that occur in Austria. Images were collected by citizen scientists with the application ``Schmetterlinge {\"O}sterreichs'' and correct species identification was ensured by an experienced entomologist. The number of images per species ranges from one to nearly 30.000. Such a strong class imbalance is common in datasets of species records. The dataset is larger than other published dataset of butterfly and moth images and offers opportunities for the training and evaluation of machine learning models on the fine-grained classification task of species identification.},
issn={2052-4463},
doi={10.1038/s41597-025-05708-z},
url={https://doi.org/10.1038/s41597-025-05708-z}
}
Another publication is in preparation.