Bird Species Classifier (ResNet-50)
Fine-tuned ResNet-50 models for classifying North American bird species from cropped bird photographs.
Model Description
These models are ResNet-50 backbones pretrained on ImageNet V2, fine-tuned on the NABirds dataset augmented with Birdsnap and iNaturalist data. They are designed for use in a photography processing pipeline that first detects birds with YOLO, crops them at full resolution, then classifies the crop.
Architecture
- Backbone: ResNet-50 (ImageNet V2 pretrained)
- Pooling: Generalized Mean (GeM) pooling
- Head:
Sequential(Dropout(0.4), Linear(2048, num_classes)) - Input size: 240x240 pixels, normalized with ImageNet mean/std
- Preprocessing:
ToTensor()+Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
Training Strategy
Three-stage progressive unfreezing:
| Stage | Unfrozen Layers | Purpose |
|---|---|---|
| 1 | FC head only | Learn species mapping on frozen backbone features |
| 2 | layer4 + FC |
Adapt high-level features |
| 3 | layer3 + layer4 + FC |
Fine-tune mid-level features |
Training was conducted using an automated research loop (Codex-driven) with 2-hour time budgets per experiment for the 98-species model and 4-10 hour budgets for the 404-species model.
Available Checkpoints
subset98_combined/best.pt β 98 Target Species
| Metric | Value |
|---|---|
| Top-1 Test Accuracy | 97.4% |
| Top-1 Val Accuracy | 97.6% |
| Classes | 98 target species |
| Training Data | NABirds + Birdsnap + iNaturalist (~38K training images) |
| Total Epochs | 12 |
| Training Time | 2 hours |
| Peak Memory | 589 MB |
| File Size | ~91 MB |
Best run: 20260319_074647_c9dbe6 β stage3 cap=6 + layer2 lr=1.5e-5
base_combined/best.pt β 404 Base Species
| Metric | Value |
|---|---|
| Top-1 Test Accuracy | 93.6% |
| Top-1 Val Accuracy | 93.6% |
| Classes | 404 NABirds base species (sex/morph variants collapsed) |
| Training Data | NABirds + Birdsnap + iNaturalist (~166K training images) |
| Total Epochs | 20 |
| Training Time | ~9.6 hours |
| Peak Memory | 898 MB |
| Batch Size | 128 |
| File Size | ~98 MB |
Best run: 20260319_234135_b8fe6e β bs=128 + stage lrs 3e-4/6e-5
Usage
With the Bird Photography Pipeline
git clone --branch MVP https://github.com/rkutyna/BirdBrained
cd BirdBrained
pip install -r requirements.txt
python download_models.py
streamlit run frontend/bird_gallery_frontend.py
Standalone Inference (PyTorch)
import torch
from torchvision import models, transforms
from PIL import Image
# Load checkpoint
state_dict = torch.load("subset98_combined/best.pt", map_location="cpu")
# Build model
model = models.resnet50()
model.fc = torch.nn.Sequential(
torch.nn.Dropout(p=0.4),
torch.nn.Linear(model.fc.in_features, 98), # or 404 for base_combined
)
model.load_state_dict(state_dict)
model.eval()
# Preprocess a cropped bird image
transform = transforms.Compose([
transforms.Resize((240, 240)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
img = Image.open("bird_crop.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)
with torch.no_grad():
logits = model(input_tensor)
probs = torch.softmax(logits, dim=1)
top5_probs, top5_indices = probs.topk(5)
Label names are provided in the repository as CSV files:
label_names.csvβ 98 target specieslabel_names_nabirds_base_species.csvβ 404 base species
Training Data
| Dataset | Images | Species | Role |
|---|---|---|---|
| NABirds | ~48K | 555 specific / 404 base | Train + Val + Test |
| Birdsnap | ~50K | ~335 matched | Train only |
| iNaturalist | ~70K | up to 280/species | Train only |
Validation and test splits use NABirds data only (no external data leakage).
Limitations
- Trained on North American bird species only (NABirds taxonomy).
- Expects cropped bird images as input β not full scene photos. Use a bird detector (e.g., YOLO) to crop first.
- The 98-species model covers only a curated subset; out-of-distribution species will be misclassified into the nearest known class.
- Performance may degrade on heavily backlit, motion-blurred, or partially occluded subjects.
Citation
If you use these models, please cite the NABirds dataset:
@inproceedings{van2015building,
title={Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection},
author={Van Horn, Grant and Branson, Steve and Farrell, Ryan and Haber, Scott and Barry, Jessie and Ipeirotis, Panos and Perona, Pietro and Belongie, Serge},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={595--604},
year={2015}
}