Near-perfect photo-ID of the Hula painted frog with zero-shot deep local-feature matching
Paper
•
2601.08798
•
Published
Finetuned checkpoint for Hula painted frog (Latonia nigriventer) photo-ID.
Lineage: EfficientNetV2 -> miewid-msv3 -> miewid-msv3-latonia-1233.
Images are preprocessed as described in the preprint:
See the preprint for full training details and evaluation protocol.
This repository provides a single PyTorch checkpoint:
miewid-msv3-latonia-1233.ptExample (embed images and compute cosine similarity):
import torch
from PIL import Image
from torchvision import transforms
from transformers import AutoModel
class ZoomCenterCrop:
def __init__(self, zoom=1.0):
self.zoom = zoom
def __call__(self, img):
w, h = img.size
m = int(min(h, w) / self.zoom)
left = (w - m) // 2
top = (h - m) // 2
return img.crop((left, top, left + m, top + m))
preprocess = transforms.Compose([
ZoomCenterCrop(zoom=2.0),
transforms.Resize((440, 440)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
])
model = AutoModel.from_pretrained("conservationxlabs/miewid-msv3", trust_remote_code=True)
ckpt = torch.load("miewid-msv3-latonia-1233.pt", map_location="cpu")
model.load_state_dict(ckpt["model"], strict=True)
model.eval()
def embed(path):
img = Image.open(path).convert("RGB")
x = preprocess(img).unsqueeze(0)
with torch.no_grad():
emb = model(x)
return emb / emb.norm(dim=1, keepdim=True)
e1 = embed("img1.jpg")
e2 = embed("img2.jpg")
cosine_sim = (e1 @ e2.T).item()
print(cosine_sim)
If you use this model, please cite:
Yesharim, M., Bina Perl, R. G., Roll, U., Gafny, S., Geffen, E., Ram, Y.
"Near-perfect photo-ID of the Hula painted frog with zero-shot deep local-feature matching."
arXiv:2601.08798 (2026). https://arxiv.org/abs/2601.08798
See LICENSE in this repository.