Update README.md

8c58ce5 verified 12 days ago

5.29 kB

	---
	license: mit
	tags:
	- image-classification
	- birds
	- resnet
	- pytorch
	- wildlife
	datasets:
	- nabirds
	- birdsnap
	- inaturalist
	pipeline_tag: image-classification
	---

	# Bird Species Classifier (ResNet-50)

	Fine-tuned ResNet-50 models for classifying North American bird species from cropped bird photographs.

	## Model Description

	These models are ResNet-50 backbones pretrained on ImageNet V2, fine-tuned on the [NABirds](https://dl.allawnmilner.com/nabirds) dataset augmented with [Birdsnap](https://thomasberg.org/) and [iNaturalist](https://www.inaturalist.org/) data. They are designed for use in a photography processing pipeline that first detects birds with YOLO, crops them at full resolution, then classifies the crop.

	### Architecture

	- Backbone: ResNet-50 (ImageNet V2 pretrained)
	- Pooling: Generalized Mean (GeM) pooling
	- Head: `Sequential(Dropout(0.4), Linear(2048, num_classes))`
	- Input size: 240x240 pixels, normalized with ImageNet mean/std
	- Preprocessing: `ToTensor()` + `Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))`

	### Training Strategy

	Three-stage progressive unfreezing:

	\| Stage \| Unfrozen Layers \| Purpose \|
	\|-------\|-----------------\|---------\|
	\| 1 \| FC head only \| Learn species mapping on frozen backbone features \|
	\| 2 \| `layer4` + FC \| Adapt high-level features \|
	\| 3 \| `layer3` + `layer4` + FC \| Fine-tune mid-level features \|

	Training was conducted using an automated research loop (Codex-driven) with 2-hour time budgets per experiment for the 98-species model and 4-10 hour budgets for the 404-species model.

	## Available Checkpoints

	### `subset98_combined/best.pt` — 98 Target Species

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Top-1 Test Accuracy \| 97.4% \|
	\| Top-1 Val Accuracy \| 97.6% \|
	\| Classes \| 98 target species \|
	\| Training Data \| NABirds + Birdsnap + iNaturalist (~38K training images) \|
	\| Total Epochs \| 12 \|
	\| Training Time \| 2 hours \|
	\| Peak Memory \| 589 MB \|
	\| File Size \| ~91 MB \|

	Best run: `20260319_074647_c9dbe6` — stage3 cap=6 + layer2 lr=1.5e-5

	### `base_combined/best.pt` — 404 Base Species

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Top-1 Test Accuracy \| 93.6% \|
	\| Top-1 Val Accuracy \| 93.6% \|
	\| Classes \| 404 NABirds base species (sex/morph variants collapsed) \|
	\| Training Data \| NABirds + Birdsnap + iNaturalist (~166K training images) \|
	\| Total Epochs \| 20 \|
	\| Training Time \| ~9.6 hours \|
	\| Peak Memory \| 898 MB \|
	\| Batch Size \| 128 \|
	\| File Size \| ~98 MB \|

	Best run: `20260319_234135_b8fe6e` — bs=128 + stage lrs 3e-4/6e-5

	## Usage

	### With the Bird Photography Pipeline

	```bash
	git clone --branch MVP https://github.com/rkutyna/BirdBrained
	cd BirdBrained
	pip install -r requirements.txt
	python download_models.py
	streamlit run frontend/bird_gallery_frontend.py
	```

	### Standalone Inference (PyTorch)

	```python
	import torch
	from torchvision import models, transforms
	from PIL import Image

	# Load checkpoint
	state_dict = torch.load("subset98_combined/best.pt", map_location="cpu")

	# Build model
	model = models.resnet50()
	model.fc = torch.nn.Sequential(
	torch.nn.Dropout(p=0.4),
	torch.nn.Linear(model.fc.in_features, 98), # or 404 for base_combined
	)
	model.load_state_dict(state_dict)
	model.eval()

	# Preprocess a cropped bird image
	transform = transforms.Compose([
	transforms.Resize((240, 240)),
	transforms.ToTensor(),
	transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
	])

	img = Image.open("bird_crop.jpg").convert("RGB")
	input_tensor = transform(img).unsqueeze(0)

	with torch.no_grad():
	logits = model(input_tensor)
	probs = torch.softmax(logits, dim=1)
	top5_probs, top5_indices = probs.topk(5)
	```

	Label names are provided in the repository as CSV files:
	- `label_names.csv` — 98 target species
	- `label_names_nabirds_base_species.csv` — 404 base species

	## Training Data

	\| Dataset \| Images \| Species \| Role \|
	\|---------\|--------\|---------\|------\|
	\| [NABirds](https://dl.allawnmilner.com/nabirds) \| ~48K \| 555 specific / 404 base \| Train + Val + Test \|
	\| [Birdsnap](https://thomasberg.org/) \| ~50K \| ~335 matched \| Train only \|
	\| [iNaturalist](https://www.inaturalist.org/) \| ~70K \| up to 280/species \| Train only \|

	Validation and test splits use NABirds data only (no external data leakage).

	## Limitations

	- Trained on North American bird species only (NABirds taxonomy).
	- Expects cropped bird images as input — not full scene photos. Use a bird detector (e.g., YOLO) to crop first.
	- The 98-species model covers only a curated subset; out-of-distribution species will be misclassified into the nearest known class.
	- Performance may degrade on heavily backlit, motion-blurred, or partially occluded subjects.

	## Citation

	If you use these models, please cite the NABirds dataset:

	```bibtex
	@inproceedings{van2015building,
	title={Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection},
	author={Van Horn, Grant and Branson, Steve and Farrell, Ryan and Haber, Scott and Barry, Jessie and Ipeirotis, Panos and Perona, Pietro and Belongie, Serge},
	booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
	pages={595--604},
	year={2015}
	}
	```