Bird Species Classifier (ResNet-50)

Fine-tuned ResNet-50 models for classifying North American bird species from cropped bird photographs.

Model Description

These models are ResNet-50 backbones pretrained on ImageNet V2, fine-tuned on the NABirds dataset augmented with Birdsnap and iNaturalist data. They are designed for use in a photography processing pipeline that first detects birds with YOLO, crops them at full resolution, then classifies the crop.

Architecture

Backbone: ResNet-50 (ImageNet V2 pretrained)
Pooling: Generalized Mean (GeM) pooling
Head: Sequential(Dropout(0.4), Linear(2048, num_classes))
Input size: 240x240 pixels, normalized with ImageNet mean/std
Preprocessing: ToTensor() + Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))

Training Strategy

Three-stage progressive unfreezing:

Stage	Unfrozen Layers	Purpose
1	FC head only	Learn species mapping on frozen backbone features
2	`layer4` + FC	Adapt high-level features
3	`layer3` + `layer4` + FC	Fine-tune mid-level features

Training was conducted using an automated research loop (Codex-driven) with 2-hour time budgets per experiment for the 98-species model and 4-10 hour budgets for the 404-species model.

Available Checkpoints

`subset98_combined/best.pt` — 98 Target Species

Metric	Value
Top-1 Test Accuracy	97.4%
Top-1 Val Accuracy	97.6%
Classes	98 target species
Training Data	NABirds + Birdsnap + iNaturalist (~38K training images)
Total Epochs	12
Training Time	2 hours
Peak Memory	589 MB
File Size	~91 MB

Best run: 20260319_074647_c9dbe6 — stage3 cap=6 + layer2 lr=1.5e-5

`base_combined/best.pt` — 404 Base Species

Metric	Value
Top-1 Test Accuracy	93.6%
Top-1 Val Accuracy	93.6%
Classes	404 NABirds base species (sex/morph variants collapsed)
Training Data	NABirds + Birdsnap + iNaturalist (~166K training images)
Total Epochs	20
Training Time	~9.6 hours
Peak Memory	898 MB
Batch Size	128
File Size	~98 MB

Best run: 20260319_234135_b8fe6e — bs=128 + stage lrs 3e-4/6e-5

Usage

With the Bird Photography Pipeline

git clone --branch MVP https://github.com/rkutyna/BirdBrained
cd BirdBrained
pip install -r requirements.txt
python download_models.py
streamlit run frontend/bird_gallery_frontend.py

Standalone Inference (PyTorch)

import torch
from torchvision import models, transforms
from PIL import Image

# Load checkpoint
state_dict = torch.load("subset98_combined/best.pt", map_location="cpu")

# Build model
model = models.resnet50()
model.fc = torch.nn.Sequential(
    torch.nn.Dropout(p=0.4),
    torch.nn.Linear(model.fc.in_features, 98),  # or 404 for base_combined
)
model.load_state_dict(state_dict)
model.eval()

# Preprocess a cropped bird image
transform = transforms.Compose([
    transforms.Resize((240, 240)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

img = Image.open("bird_crop.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)

with torch.no_grad():
    logits = model(input_tensor)
    probs = torch.softmax(logits, dim=1)
    top5_probs, top5_indices = probs.topk(5)

Label names are provided in the repository as CSV files:

label_names.csv — 98 target species
label_names_nabirds_base_species.csv — 404 base species

Training Data

Dataset	Images	Species	Role
NABirds	~48K	555 specific / 404 base	Train + Val + Test
Birdsnap	~50K	~335 matched	Train only
iNaturalist	~70K	up to 280/species	Train only

Validation and test splits use NABirds data only (no external data leakage).

Limitations

Trained on North American bird species only (NABirds taxonomy).
Expects cropped bird images as input — not full scene photos. Use a bird detector (e.g., YOLO) to crop first.
The 98-species model covers only a curated subset; out-of-distribution species will be misclassified into the nearest known class.
Performance may degrade on heavily backlit, motion-blurred, or partially occluded subjects.

Citation

If you use these models, please cite the NABirds dataset:

@inproceedings{van2015building,
  title={Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection},
  author={Van Horn, Grant and Branson, Steve and Farrell, Ryan and Haber, Scott and Barry, Jessie and Ipeirotis, Panos and Perona, Pietro and Belongie, Serge},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={595--604},
  year={2015}
}

Downloads last month: -; Downloads are not tracked for this model. How to track