File size: 5,292 Bytes

---
license: mit
tags:
  - image-classification
  - birds
  - resnet
  - pytorch
  - wildlife
datasets:
  - nabirds
  - birdsnap
  - inaturalist
pipeline_tag: image-classification
---

# Bird Species Classifier (ResNet-50)

Fine-tuned ResNet-50 models for classifying North American bird species from cropped bird photographs.

## Model Description

These models are ResNet-50 backbones pretrained on ImageNet V2, fine-tuned on the [NABirds](https://dl.allawnmilner.com/nabirds) dataset augmented with [Birdsnap](https://thomasberg.org/) and [iNaturalist](https://www.inaturalist.org/) data. They are designed for use in a photography processing pipeline that first detects birds with YOLO, crops them at full resolution, then classifies the crop.

### Architecture

- **Backbone**: ResNet-50 (ImageNet V2 pretrained)
- **Pooling**: Generalized Mean (GeM) pooling
- **Head**: `Sequential(Dropout(0.4), Linear(2048, num_classes))`
- **Input size**: 240x240 pixels, normalized with ImageNet mean/std
- **Preprocessing**: `ToTensor()` + `Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))`

### Training Strategy

Three-stage progressive unfreezing:

| Stage | Unfrozen Layers | Purpose |
|-------|-----------------|---------|
| 1 | FC head only | Learn species mapping on frozen backbone features |
| 2 | `layer4` + FC | Adapt high-level features |
| 3 | `layer3` + `layer4` + FC | Fine-tune mid-level features |

Training was conducted using an automated research loop (Codex-driven) with 2-hour time budgets per experiment for the 98-species model and 4-10 hour budgets for the 404-species model.

## Available Checkpoints

### `subset98_combined/best.pt` — 98 Target Species

| Metric | Value |
|--------|-------|
| Top-1 Test Accuracy | **97.4%** |
| Top-1 Val Accuracy | 97.6% |
| Classes | 98 target species |
| Training Data | NABirds + Birdsnap + iNaturalist (~38K training images) |
| Total Epochs | 12 |
| Training Time | 2 hours |
| Peak Memory | 589 MB |
| File Size | ~91 MB |

Best run: `20260319_074647_c9dbe6` — stage3 cap=6 + layer2 lr=1.5e-5

### `base_combined/best.pt` — 404 Base Species

| Metric | Value |
|--------|-------|
| Top-1 Test Accuracy | **93.6%** |
| Top-1 Val Accuracy | 93.6% |
| Classes | 404 NABirds base species (sex/morph variants collapsed) |
| Training Data | NABirds + Birdsnap + iNaturalist (~166K training images) |
| Total Epochs | 20 |
| Training Time | ~9.6 hours |
| Peak Memory | 898 MB |
| Batch Size | 128 |
| File Size | ~98 MB |

Best run: `20260319_234135_b8fe6e` — bs=128 + stage lrs 3e-4/6e-5

## Usage

### With the Bird Photography Pipeline

```bash
git clone --branch MVP https://github.com/rkutyna/BirdBrained
cd BirdBrained
pip install -r requirements.txt
python download_models.py
streamlit run frontend/bird_gallery_frontend.py
```

### Standalone Inference (PyTorch)

```python
import torch
from torchvision import models, transforms
from PIL import Image

# Load checkpoint
state_dict = torch.load("subset98_combined/best.pt", map_location="cpu")

# Build model
model = models.resnet50()
model.fc = torch.nn.Sequential(
    torch.nn.Dropout(p=0.4),
    torch.nn.Linear(model.fc.in_features, 98),  # or 404 for base_combined
)
model.load_state_dict(state_dict)
model.eval()

# Preprocess a cropped bird image
transform = transforms.Compose([
    transforms.Resize((240, 240)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

img = Image.open("bird_crop.jpg").convert("RGB")
input_tensor = transform(img).unsqueeze(0)

with torch.no_grad():
    logits = model(input_tensor)
    probs = torch.softmax(logits, dim=1)
    top5_probs, top5_indices = probs.topk(5)
```

Label names are provided in the repository as CSV files:
- `label_names.csv` — 98 target species
- `label_names_nabirds_base_species.csv` — 404 base species

## Training Data

| Dataset | Images | Species | Role |
|---------|--------|---------|------|
| [NABirds](https://dl.allawnmilner.com/nabirds) | ~48K | 555 specific / 404 base | Train + Val + Test |
| [Birdsnap](https://thomasberg.org/) | ~50K | ~335 matched | Train only |
| [iNaturalist](https://www.inaturalist.org/) | ~70K | up to 280/species | Train only |

Validation and test splits use NABirds data only (no external data leakage).

## Limitations

- Trained on North American bird species only (NABirds taxonomy).
- Expects **cropped bird images** as input — not full scene photos. Use a bird detector (e.g., YOLO) to crop first.
- The 98-species model covers only a curated subset; out-of-distribution species will be misclassified into the nearest known class.
- Performance may degrade on heavily backlit, motion-blurred, or partially occluded subjects.

## Citation

If you use these models, please cite the NABirds dataset:

```bibtex
@inproceedings{van2015building,
  title={Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection},
  author={Van Horn, Grant and Branson, Steve and Farrell, Ryan and Haber, Scott and Barry, Jessie and Ipeirotis, Panos and Perona, Pietro and Belongie, Serge},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={595--604},
  year={2015}
}
```