MiewID

MiewID is a multi-species individual animal re-identification model. Given a cropped photo of an animal it returns a 2,152-dimensional embedding vector that captures the animal's identity. Compare two embeddings with cosine similarity to decide whether they show the same individual.

It covers 64 terrestrial and aquatic species and runs in production on the Wildbook wildlife monitoring platform.

This repository ships both the PyTorch model (loadable via HuggingFace transformers) and an ONNX export for runtimes that can't pull in PyTorch.


What the paper found

Otarashvili et al. (2024) trained a single embedding network on 49 species, 37K individuals, and 225K expert-curated annotations and measured it against three baselines:

Multispecies beats single-species. The joint model outperformed 49 separate models each trained on one species. The gap was largest for species with the least data; averaged across all species the multispecies model gained 12.5% top-1 accuracy.

Zero-shot on unseen species beats MegaDescriptor. Tested on 33 species never seen during training, MiewID beat MegaDescriptor‑L‑384 by an average of 19.2 percentage points per species.

Fine-tuning works with few examples. When only a handful of annotated individuals are available for a new species, fine-tuning the pretrained model consistently outperforms training from scratch. Incorporating the few examples directly into full multispecies retraining does even better.

Vision transformers didn't help. SwinV2‑B was 4.5% worse than EfficientNetV2‑M on this task.

The production model extends the paper's approach: it uses a GeM pooling layer and BatchNorm head (yielding 2,152‑dim embeddings instead of 2,048), operates at 440×440 resolution (vs 256 in the paper experiments), and covers 64 species.


How it works

Camera trap or uploaded photo Detection YOLO / MegaDetector Crop chip 440×440, normalised MiewID EfficientNetV2‑M + GeM + BN Embedding (1, 2152) · L2=1 Cosine similarity against known‑individual gallery Individual #A27
  1. A detector (YOLO, MegaDetector, etc.) finds animals and crops out each chip.
  2. Resize to 440×440 and apply ImageNet normalisation.
  3. MiewID maps the chip to a 2,152‑dimension embedding.
  4. Cosine similarity against a database of known individuals returns the closest match.

Architecture

Component Detail
Backbone EfficientNetV2‑M, ~51M parameters, ImageNet‑1K pretrained
Pooling GeM (Generalised Mean Pooling, p=3)
Head BatchNorm1d → L2‑normalised output
Loss (training) Sub-center ArcFace (k=3 sub-centers), dynamic margins
Input (B, 3, 440, 440) float32, ImageNet‑normalised
Output (B, 2152) float32, unit L2 norm

Loss design

The model was trained with sub-center ArcFace (k=3 sub-centers per class), which improves robustness to label noise by allowing each class to occupy a small region of embedding space rather than a single point. Combined with dynamic margins that adapt the angular penalty per class based on sample count, this handles the heavy class imbalance typical of wildlife data.

Data split

Annotations from each individual are split so roughly half the individuals appear in both train and test (with different images) and the other half appear only in test. Evaluation uses a one-vs-all scheme on the test set: each annotation is a query matched against all other test annotations (excluding itself), avoiding the soft data leak of using the training set as a reference gallery.


Training data

49 species, 59 source datasets, 37K individuals, 225K annotations. Sources fall into three groups:

Contributed through Wildbook — species experts manually curate individual identities on Wildbook-managed platforms. Each annotation carries a per-sighting ID verified by a human. Data partners include NOAA, Sarasota Dolphin Research Project, Botswana Predator Conservation Trust, ECOCEAN, Giraffe Conservation Foundation, Norwegian Orca Survey, Cascadia Research Collective, African Parks, and many others.

Public re-ID datasets — DogFaceNet, PrimFace, ChimpFace, MacaqueFaces, LemurFace, THoDBRL2015, SeaTurtleID, SealID, C-Tai, C-Zoo, Lomas Capuchin, wildlife-datasets.

Community science — the Happywhale Kaggle competition dataset covers multiple cetacean species, including blue whales, dusky dolphins, orcas, and spinner dolphins.

A public subset is available through LILA.science.

The full per-species breakdown is in DATA_SOURCES.md.

Viewpoints matter

For species where markings differ across the body, annotations from left and right views are treated as different individuals during training. For species identified by outline shape (e.g., dorsal fins), opposite-side views can match.

Training recipe

The paper reports these hyperparameters (optimised with Optuna):

Parameter Value
Image size (experiments) 256×256
Image size (production) 440×440
Batch size 112
Warmup 15 epochs, linear 1.5e‑5 → 1.5e‑3
Decay exponential, 0.8 per epoch
Augmentations random colour sharpening, CLAHE, shift ±25%, scale ±20%, rotation ±15°, colour jitter

Performance

Conservation X Labs evaluated MiewID as an individual-retrieval model using rank‑k accuracy. Selected results:

Species Rank‑1 Species Rank‑1
Zebra (Grevy's) 96.1% Giraffe (Reticulated) 98.8%
Cheetah 70.8% Lion 93.2%
Leopard 77.6% Wild Dog 86.1%
Humpback Whale 70.3% Orca 86.0%
Whale Shark 65.2% Green Turtle 89.0%
Bottlenose Dolphin 92.4% Spinner Dolphin 98.8%

The full 64‑species evaluation table (mAP, rank‑1 through rank‑20, data source) lives in DATA_SOURCES.md.


Quick start

PyTorch + Transformers

from transformers import AutoModel
from PIL import Image
import numpy as np
import torch

model = AutoModel.from_pretrained("james-burgess/miewid", trust_remote_code=True)

chip = Image.open("zebra_chip.jpg").convert("RGB").resize((440, 440))
tensor = torch.from_numpy(np.array(chip, dtype=np.float32))
tensor = (tensor - torch.tensor([0.485, 0.456, 0.406]) * 255) \
       / (torch.tensor([0.229, 0.224, 0.225]) * 255)
tensor = tensor.permute(2, 0, 1).unsqueeze(0)

with torch.no_grad():
    embedding = model(tensor).numpy()
# -> (1, 2152)

ONNX Runtime

pip install onnxruntime huggingface_hub
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
from PIL import Image

model_path = hf_hub_download("james-burgess/miewid", "miewid.onnx")
session = ort.InferenceSession(
    model_path,
    providers=["CPUExecutionProvider"],
)
# To run on GPU, install onnxruntime-gpu and use ["CUDAExecutionProvider"]

MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32)
STD  = np.array([0.229, 0.224, 0.225], dtype=np.float32)

chip = Image.open("zebra_chip.jpg").convert("RGB")
chip = chip.resize((440, 440), Image.BILINEAR)
chip = np.array(chip, dtype=np.float32)
chip = (chip - MEAN * 255.0) / (STD * 255.0)
chip = np.transpose(chip, (2, 0, 1))
chip = np.expand_dims(chip, axis=0)

embedding = session.run(None, {"input": chip})[0]
# -> (1, 2152)

OpenCV DNN

import cv2
import numpy as np

MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32) * 255.0
STD  = np.array([0.229, 0.224, 0.225], dtype=np.float32) * 255.0

net = cv2.dnn.readNetFromONNX("miewid.onnx")

chip = cv2.imread("zebra_chip.jpg")
chip = cv2.cvtColor(chip, cv2.COLOR_BGR2RGB)
chip = cv2.resize(chip, (440, 440)).astype(np.float32)
chip = (chip - MEAN) / STD

net.setInput(np.transpose(chip, (2, 0, 1))[np.newaxis, ...])
embedding = net.forward()

Preprocessing

MiewID expects ImageNet normalisation:

pixel_normalised = (pixel − mean × 255) / (std × 255)
R G B
mean 0.485 0.456 0.406
std 0.229 0.224 0.225

Order: resize → (chip − mean × 255) / (std × 255) → CHW transpose → batch dim.


Finetuning

finetune.ipynb adapts MiewID to a new species, region, or camera-trap setup. It covers:

  • Loading the base model and freezing the backbone
  • Attaching an ArcFace head for individual‑level training
  • Training loop with data augmentation
  • ONNX export of the finetuned model

Export

miewid.onnx was exported from the PyTorch checkpoint with torch.onnx.export() (opset 14). To recreate it:

python scripts/export.py --upload

License

MIT


Citation

If you use MiewID in your work, cite both the paper and the software:

@misc{otarashvili2024multispecies,
      title={Multispecies Animal Re-ID Using a Large Community-Curated Dataset},
      author={Lasha Otarashvili and Tamilselvan Subramanian and Jason Holmberg
              and J.J. Levenson and Charles V. Stewart},
      year={2024},
      eprint={2412.05602},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.05602},
}

@misc{WildMe2023,
      author={Otarashvili, Lasha and Holmberg, Jason and Abidi, Collin
              and Subramanian, Tamilselvan},
      title={MiewID},
      year={2024},
      publisher={Zenodo},
      doi={10.5281/zenodo.13647526},
      url={https://github.com/WildMeOrg/wbia-plugin-miew-id},
}

Credits

Conservation X Labs developed MiewID with the Wild Me community and data partners. Funding came from the Gordon and Betty Moore Foundation, the Bureau of Ocean Energy Management (BOEM), and the US National Science Foundation (Award 2118240).

Downloads last month
-
Safetensors
Model size
51.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using james-burgess/miewid 3

Paper for james-burgess/miewid