Butterfly & moth ID: wing classifier (BioCLIP 2.5-H + hierarchical head)

A fine-grained classifier that identifies Neotropical butterflies and nocturnal moths from wing photos, down to subspecies. It is the head used by the Butterfly ID inference Space and the AI Identifier web app.

The frozen BioCLIP 2.5-H backbone is not redistributed here. Get it from imageomics/bioclip-2.5-vith14. This repo holds the trained head and the wing-cropper.

Coverage

The label space spans Neotropical butterflies across all major families plus, as of this version, a substantial component of nocturnal moths4,946 species / 7,292 subspecies-level leaves / 883 genera / 69 tribes / 24 subfamilies / 11 families:

  • Butterflies — Nymphalidae, Hesperiidae (skippers), Riodinidae (metalmarks), Lycaenidae, Pieridae, Papilionidae, plus the nocturnal Hedylidae and the day-flying Castniidae.
  • Nocturnal moths (new) — predominantly Sphingidae (hawkmoths), with Saturniidae, Geometridae, Notodontidae, Erebidae/Arctiinae, Noctuidae and others.

Hierarchical roll-up (genus → tribe → family) is currently complete for the butterflies. The moth genera are still being mapped into the higher-rank taxonomy, so moth predictions are most reliable at species / genus level for now; their family marginalisation is incomplete.

Sampling is very uneven. Most butterfly data sits in the Ithomiini mimicry radiation (Sanger/Ikiam collection), and most moth data in Sphingidae. Sparsely-represented groups (Hedylidae, Castniidae, many Hesperiidae/Lycaenidae, and the thinner moth families) are in-scope but low-confidence — treat confident calls outside the well-sampled systems with caution. Even within Ithomiini, Müllerian mimicry makes subspecies look-alikes genuinely hard to tell apart.

Files

file what it is
head_hier.pt the hierarchical CosineHead / ArcFace classifier head (feat_dim 1024). Holds leaves (7,292 taxa), rank_vocab (4,946 species / 883 genera), leaf2rank, head_state_dict.
wing_seg.pt YOLO26-seg wing/insect segmenter used to crop the photo to the wings before BioCLIP (masks trained with SAM 3).
region_checklist.json per-region (side-of-Andes / country) species & subspecies checklist, for the optional geographic prior.

Pipeline

image → (YOLO wing-crop) → BioCLIP 2.5-H embed (frozen) → CosineHead → leaf probs → SUM-marginalise to species/genus → (optional geo prior re-rank)

Usage

The full, runnable inference code lives in the Space (inference.py). Minimal sketch:

import torch, open_clip
ck = torch.load("head_hier.pt", map_location="cpu", weights_only=False)
leaves = ck["leaves"]                  # 7292 finest-rank labels (subspecies where known)
model, _, preprocess = open_clip.create_model_and_transforms("hf-hub:imageomics/bioclip-2.5-vith14")
# build the CosineHead (s=30, LayerNorm) from ck["head_state_dict"] (see inference.py)
feat = model.encode_image(preprocess(img).unsqueeze(0)).float()
# probs = head(feat).softmax(1)  ->  argsort for the top taxa

Performance

Out-of-fold top-1 accuracy on Sanger/Ikiam specimens (dorsal+ventral combined, with the side-of-Andes + Ecuador prior). These are measured on the butterfly collection, the deployment target:

Rank Top-1
Subspecies 84.0%
Species 89.7%
Genus 95.4%
Tribe 96.8%
Subfamily 98.9%
Family 99.4%

Reliable from genus upward (≥95%). Subspecies is the hard frontier because Müllerian mimicry produces look-alikes across species. The backbone is currently frozen (only the head is trained); backbone fine-tuning is the main planned lever for species/subspecies gains.

Training data

The subspecies-level taxonomic identifications were compiled from Butterflies of America, Sangay, Noreste, and Cotacachi, expanded with additional Neotropical butterfly photo databases from across the region and, for the nocturnal moths, specialist collections such as the Sphingidae Taxonomic Inventory and other hawkmoth / saturniid resources. Figure plates from taxonomic PDF monographs (e.g. Cock 2018, Hawk-moths of Trinidad) were re-extracted with layout-aware parsing (MinerU) so each single-specimen figure carries its caption's species + subspecies; composite/multi-specimen plates were excluded.

License

CC-BY-NC-4.0 (attribution, non-commercial). The non-commercial term reflects that the classifier's subspecies-level labels were compiled from third-party taxonomic databases (see Training data); please credit this work and respect those sources' terms.

Disclaimer

This is an AI suggestion, not a definitive identification. It is intended as a curation aid to flag uncertain or mislabelled identifications.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using fr4nzzch/butterfly-id-classifier 1