| --- |
| license: mit |
| tags: |
| - image-classification |
| - birds |
| - resnet |
| - pytorch |
| - wildlife |
| datasets: |
| - nabirds |
| - birdsnap |
| - inaturalist |
| pipeline_tag: image-classification |
| --- |
| |
| # Bird Species Classifier (ResNet-50) |
|
|
| Fine-tuned ResNet-50 models for classifying North American bird species from cropped bird photographs. |
|
|
| ## Model Description |
|
|
| These models are ResNet-50 backbones pretrained on ImageNet V2, fine-tuned on the [NABirds](https://dl.allawnmilner.com/nabirds) dataset augmented with [Birdsnap](https://thomasberg.org/) and [iNaturalist](https://www.inaturalist.org/) data. They are designed for use in a photography processing pipeline that first detects birds with YOLO, crops them at full resolution, then classifies the crop. |
|
|
| ### Architecture |
|
|
| - **Backbone**: ResNet-50 (ImageNet V2 pretrained) |
| - **Pooling**: Generalized Mean (GeM) pooling |
| - **Head**: `Sequential(Dropout(0.4), Linear(2048, num_classes))` |
| - **Input size**: 240x240 pixels, normalized with ImageNet mean/std |
| - **Preprocessing**: `ToTensor()` + `Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))` |
|
|
| ### Training Strategy |
|
|
| Three-stage progressive unfreezing: |
|
|
| | Stage | Unfrozen Layers | Purpose | |
| |-------|-----------------|---------| |
| | 1 | FC head only | Learn species mapping on frozen backbone features | |
| | 2 | `layer4` + FC | Adapt high-level features | |
| | 3 | `layer3` + `layer4` + FC | Fine-tune mid-level features | |
|
|
| Training was conducted using an automated research loop (Codex-driven) with 2-hour time budgets per experiment for the 98-species model and 4-10 hour budgets for the 404-species model. |
|
|
| ## Available Checkpoints |
|
|
| ### `subset98_combined/best.pt` β 98 Target Species |
| |
| | Metric | Value | |
| |--------|-------| |
| | Top-1 Test Accuracy | **97.4%** | |
| | Top-1 Val Accuracy | 97.6% | |
| | Classes | 98 target species | |
| | Training Data | NABirds + Birdsnap + iNaturalist (~38K training images) | |
| | Total Epochs | 12 | |
| | Training Time | 2 hours | |
| | Peak Memory | 589 MB | |
| | File Size | ~91 MB | |
| |
| Best run: `20260319_074647_c9dbe6` β stage3 cap=6 + layer2 lr=1.5e-5 |
| |
| ### `base_combined/best.pt` β 404 Base Species |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Top-1 Test Accuracy | **93.6%** | |
| | Top-1 Val Accuracy | 93.6% | |
| | Classes | 404 NABirds base species (sex/morph variants collapsed) | |
| | Training Data | NABirds + Birdsnap + iNaturalist (~166K training images) | |
| | Total Epochs | 20 | |
| | Training Time | ~9.6 hours | |
| | Peak Memory | 898 MB | |
| | Batch Size | 128 | |
| | File Size | ~98 MB | |
|
|
| Best run: `20260319_234135_b8fe6e` β bs=128 + stage lrs 3e-4/6e-5 |
|
|
| ## Usage |
|
|
| ### With the Bird Photography Pipeline |
|
|
| ```bash |
| git clone --branch MVP https://github.com/rkutyna/BirdBrained |
| cd BirdBrained |
| pip install -r requirements.txt |
| python download_models.py |
| streamlit run frontend/bird_gallery_frontend.py |
| ``` |
|
|
| ### Standalone Inference (PyTorch) |
|
|
| ```python |
| import torch |
| from torchvision import models, transforms |
| from PIL import Image |
| |
| # Load checkpoint |
| state_dict = torch.load("subset98_combined/best.pt", map_location="cpu") |
| |
| # Build model |
| model = models.resnet50() |
| model.fc = torch.nn.Sequential( |
| torch.nn.Dropout(p=0.4), |
| torch.nn.Linear(model.fc.in_features, 98), # or 404 for base_combined |
| ) |
| model.load_state_dict(state_dict) |
| model.eval() |
| |
| # Preprocess a cropped bird image |
| transform = transforms.Compose([ |
| transforms.Resize((240, 240)), |
| transforms.ToTensor(), |
| transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), |
| ]) |
| |
| img = Image.open("bird_crop.jpg").convert("RGB") |
| input_tensor = transform(img).unsqueeze(0) |
| |
| with torch.no_grad(): |
| logits = model(input_tensor) |
| probs = torch.softmax(logits, dim=1) |
| top5_probs, top5_indices = probs.topk(5) |
| ``` |
|
|
| Label names are provided in the repository as CSV files: |
| - `label_names.csv` β 98 target species |
| - `label_names_nabirds_base_species.csv` β 404 base species |
|
|
| ## Training Data |
|
|
| | Dataset | Images | Species | Role | |
| |---------|--------|---------|------| |
| | [NABirds](https://dl.allawnmilner.com/nabirds) | ~48K | 555 specific / 404 base | Train + Val + Test | |
| | [Birdsnap](https://thomasberg.org/) | ~50K | ~335 matched | Train only | |
| | [iNaturalist](https://www.inaturalist.org/) | ~70K | up to 280/species | Train only | |
|
|
| Validation and test splits use NABirds data only (no external data leakage). |
|
|
| ## Limitations |
|
|
| - Trained on North American bird species only (NABirds taxonomy). |
| - Expects **cropped bird images** as input β not full scene photos. Use a bird detector (e.g., YOLO) to crop first. |
| - The 98-species model covers only a curated subset; out-of-distribution species will be misclassified into the nearest known class. |
| - Performance may degrade on heavily backlit, motion-blurred, or partially occluded subjects. |
|
|
| ## Citation |
|
|
| If you use these models, please cite the NABirds dataset: |
|
|
| ```bibtex |
| @inproceedings{van2015building, |
| title={Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection}, |
| author={Van Horn, Grant and Branson, Steve and Farrell, Ryan and Haber, Scott and Barry, Jessie and Ipeirotis, Panos and Perona, Pietro and Belongie, Serge}, |
| booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, |
| pages={595--604}, |
| year={2015} |
| } |
| ``` |