photo-identifier-v3
A single fine-tuned ConvNeXt-Small model that identifies a wide range of subjects in everyday photographs — objects, animals, vehicles, food, landmarks, scenes, textures, people, aerial views, and more.
~1,900 classes across 30+ domains. One model, any photo.
⚠️ Active training — the checkpoint published here covers 1,216 classes at epoch 20. A full 300-epoch run targeting ~1,900 classes is in progress; a new version will be pushed when complete.
Model Details
| Property | Value |
|---|---|
| Backbone | ConvNeXt-Small (pretrained ImageNet-1k, torchvision) |
| Parameters | ~49.8 M |
| Input size | 224 × 224 RGB |
| Classes | ~1,900 (see full list in config.json) |
| Loss | Focal Loss γ=2 + Class-Balanced α (CVPR 2019) |
| Inference weights | EMA shadow model (decay 0.999) |
| Best val accuracy | 66.25% (1,216 classes, epoch 20 — full run in progress) |
What It Recognises
| Domain | Examples |
|---|---|
| Food | 101 foods (sushi, pizza, steak, ramen, …) |
| Animals — wildlife | 57 species: snow leopard, orca, gorilla, bison, platypus, … |
| Animals — birds | 40 species: bald eagle, painted bunting, snowy owl, roadrunner, … |
| Animals — marine | whale, dolphin, sea turtle, great white shark, octopus, coral reef |
| Animals — reptile | king cobra, komodo dragon, chameleon, saltwater crocodile, … |
| Animals — insects | monarch butterfly, blue morpho, luna moth, honeybee, dragonfly |
| Animals — exotic birds | flamingo, toucan, penguin, peacock, parrot, albatross |
| Animals — trees | 45 species: coast redwood, joshua tree, weeping willow, ginkgo, … |
| Vehicles | 196 car models (Stanford Cars), 100 aircraft types (FGVC-Aircraft), buses, trains, motorcycles, helicopters |
| Landmarks | Eiffel Tower, Colosseum, Taj Mahal, Machu Picchu, Burj Khalifa, … |
| Named skyscrapers | Empire State, Chrysler, One WTC, Petronas Towers, Taipei 101, … |
| Architecture styles | Victorian, Art Deco, Gothic, Modernist, log cabin, castle, mosque, … |
| Home styles | Farmhouse, craftsman, bungalow, A-frame, adobe, Tudor revival, … |
| Scenes — outdoor | 397 SUN397 scenes + Intel scenes (forest, glacier, mountain, sea, …) |
| Scenes — aerial | 45 RESISC45 overhead/satellite classes (bridge, stadium, airport, …) |
| American Southwest | 26 locations: Arches, Zion, Antelope Canyon, Horseshoe Bend, Wave, … |
| Sky & weather | Sunset, aurora, fog, blizzard + tornado, hurricane, lightning, flood |
| Clouds | Cumulus, cirrus, cumulonimbus, lenticular, mammatus |
| Mountains | Everest, K2, Matterhorn, Denali, Kilimanjaro, Mont Blanc, Rainier |
| Night scenes | City at night, neon signs, fireworks, bioluminescence |
| Space | Rocket launch, astronaut, Earth from space, moon, Milky Way |
| Sports | 17 action sports: basketball, surfing, skiing, cycling, gymnastics, … |
| Musical instruments | Piano, violin, guitar, drums, saxophone, cello, harp, banjo, … |
| Flowers | 102 Oxford flowers + 12 wildflower species |
| Rocks & minerals | Granite, obsidian, quartz, amethyst, geode, malachite |
| Mushrooms | Chanterelle, morel, oyster, fly agaric, lion's mane |
| Textures | 47 DTD texture classes (rippled, braided, knitted, cracked, …) |
| Traffic signs | 43 GTSRB German traffic sign types |
| People | FairFace 18 age × gender classes + full-body people/crowd/wedding |
| Medical | 4 Alzheimer MRI stages, 7 skin lesion types |
| Documents | 6 document type classes |
Quick Start
Google Colab / fresh environment: run
!pip install -q transformers torchvision safetensors Pillow huggingface_hubfirst.
from transformers import AutoModelForImageClassification
from PIL import Image
# Load from HuggingFace Hub (trust_remote_code required for custom backbone)
model = AutoModelForImageClassification.from_pretrained(
"BlakePeavy/photo-identifier-v3",
trust_remote_code=True,
)
model.eval()
# Run inference
img = Image.open("my_photo.jpg").convert("RGB")
results = model.predict(img, top_k=5)
for label, score in results:
print(f"{score:.1%} {label}")
Example output:
82.4% animals--snow_leopard
9.1% animals--cheetah
4.2% animals--jaguar
Using transformers pipeline
The image processor must be loaded explicitly because this model uses a
custom model_type not registered in the default transformers auto-registry.
from transformers import pipeline, AutoImageProcessor
# Load the image processor from the repo's preprocessor_config.json
processor = AutoImageProcessor.from_pretrained(
"BlakePeavy/photo-identifier-v3",
use_fast=False,
)
pipe = pipeline(
"image-classification",
model="BlakePeavy/photo-identifier-v3",
image_processor=processor,
trust_remote_code=True,
)
results = pipe("my_photo.jpg", top_k=5)
for r in results:
print(f"{r['score']:.1%} {r['label']}")
Loading the Model Manually
Useful when you want plain PyTorch with no transformers dependency.
The weights are stored as model.safetensors (not pytorch_model.bin).
Keys have a convnext. prefix that must be stripped before loading into
a bare torchvision.models.convnext_small.
# !pip install -q torch torchvision safetensors Pillow huggingface_hub
import torch
import json
from torchvision import models, transforms
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
from PIL import Image
REPO = "BlakePeavy/photo-identifier-v3"
# Download model files
config_path = hf_hub_download(REPO, "config.json")
weights_path = hf_hub_download(REPO, "model.safetensors")
# Load label map
with open(config_path) as f:
cfg = json.load(f)
classes = cfg["id2label"] # {"0": "class_name", ...}
# Rebuild the backbone (weights=None — we load from safetensors below)
model = models.convnext_small(weights=None)
in_f = model.classifier[-1].in_features
model.classifier[-1] = torch.nn.Linear(in_f, len(classes))
# Load from safetensors — strip the "convnext." wrapper prefix
sd = load_file(weights_path)
sd = {k.replace("convnext.", "", 1): v for k, v in sd.items()
if k.startswith("convnext.")}
model.load_state_dict(sd)
model.eval()
# Preprocess
tf = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open("my_photo.jpg").convert("RGB")
with torch.no_grad():
logits = model(tf(img).unsqueeze(0))
probs = logits.softmax(-1)[0]
top5 = probs.topk(5)
for score, idx in zip(top5.values, top5.indices):
print(f"{score:.1%} {classes[str(idx.item())]}")
Training Recipe
14 techniques from published papers applied together:
| Technique | Paper |
|---|---|
| ConvNeXt-Small backbone | Liu et al., CVPR 2022 |
| Differential LR (1e-5 backbone / 5e-4 head) | Standard transfer learning practice |
| Warmup + Cosine annealing | Loshchilov & Hutter, ICLR 2017 |
| Label smoothing ε=0.1 | Müller, Kornblith & Hinton, NeurIPS 2019 |
| Mixup α=0.2 | Zhang et al., ICLR 2018 |
| CutMix α=1.0 | Yun et al., ICCV 2019 |
| RandAugment N=2 M=9 | Cubuk et al., NeurIPS 2020 |
| Random Erasing p=0.25 (post-normalize) | Zhong et al., AAAI 2020 |
| EMA decay=0.999 | Tarvainen & Valpola, NeurIPS 2017 |
| Gradient clipping max_norm=1.0 | — |
| Progressive resize 160→224 px | Tan & Le, ICML 2021 |
| Focal Loss γ=2 | Lin et al., ICCV 2017 |
| Class-Balanced α weights β=0.9999 | Cui et al., CVPR 2019 |
| AdamW weight_decay=0.05 | Loshchilov & Hutter, ICLR 2019 |
| Automatic Mixed Precision (AMP) | Micikevicius et al., ICLR 2018 |
Data Sources
Trained on a mix of public research datasets and openly-licensed photos. Two sources carry licence terms worth noting:
- iNaturalist — species observation photos. Individual observations are licenced by their contributors; a subset are CC BY-NC. This model is released for non-commercial use accordingly.
- Wikimedia Commons — CC-licensed landscape and subject photography. Some images are CC BY-SA (share-alike).
Limitations
- Resolution: Trained at 224×224. Very small subjects in high-resolution photos may not be detected.
- Rare classes: Classes with fewer than 100 training images (some Wikimedia groups) have higher error rates. Focal Loss mitigates but does not eliminate this.
- Medical classes (Alzheimer MRI, skin lesions) are for demonstration only — not for clinical use.
- Class overlap: Some visually similar classes (leopard / cheetah / jaguar, Victorian / Gothic architecture) may be confused near their decision boundaries.
Citation
If you use this model in your work:
@misc{photo-identifier-v3-2026,
title = {photo-identifier-v3: A Single Model for ~1900-class Open-World Image Classification},
year = {2026},
url = {https://huggingface.co/BlakePeavy/photo-identifier-v3},
note = {ConvNeXt-Small fine-tuned on 20+ datasets with Focal Loss and Class-Balanced weighting}
}
Licence
Code: MIT
Model weights: Non-Commercial — a subset of iNaturalist training data is CC BY-NC.
See iNaturalist licensing and Wikimedia Commons reuse terms for details.
- Downloads last month
- 116
Datasets used to train BlakePeavy/photo-identifier-v3
Evaluation results
- Accuracy on PhotoID mixed datasetself-reported0.662