Jewelry Photo Classifier

Two-stage waterfall pipeline for classifying jewelry photos as real customer submissions vs. catalog/screenshot/AI-generated images.

Architecture

Stage	Model	Resolution	Task	Parameters
A	ConvNeXt-Base (`convnext_base.fb_in22k_ft_in1k`)	384x384	Jewelry vs Not Jewelry	87.6M
B	DeiT-Small (`deit_small_patch16_224`)	512x512	Real vs Not Real	22.0M

Both models are ImageNet-pretrained and fine-tuned on proprietary jewelry photo data.

Decision Flow

Image -> Stage A (jewelry?) -> p(jewelry) >= 0.88 -> Stage B (real?)
                             -> p(jewelry) <= 0.12 -> NOT_JEWELRY
                             -> otherwise          -> NEEDS_REVIEW

Stage B -> p(real) >= 0.71 -> JEWELRY_REAL
        -> p(real) <= 0.30 -> JEWELRY_NOT_REAL
        -> otherwise       -> NEEDS_REVIEW

Temperature scaling is applied before softmax (Stage A: T=1.502, Stage B: T=1.397).

Performance (4,406 test images)

Metric	Value
Stage A jewelry recall	99.78%
Stage B real precision	95.1%
Stage B real recall	93.7%
Total review rate	8.1%

Files

stageA_convnext_b_best.pt — Stage A checkpoint (state_dict)
stageB_deit_s_clean_best.pt — Stage B checkpoint (state_dict)
thresholds.json — Threshold/temperature configuration

Usage

from huggingface_hub import hf_hub_download
import timm, torch, json
from PIL import Image
from torchvision import transforms

# Download files
config = hf_hub_download("Valdos33/jewelry-photo-classifier", "thresholds.json")
ckpt_a = hf_hub_download("Valdos33/jewelry-photo-classifier", "stageA_convnext_b_best.pt")
ckpt_b = hf_hub_download("Valdos33/jewelry-photo-classifier", "stageB_deit_s_clean_best.pt")

Built by BriteCo.

Downloads last month: -