Jewelry Photo Classifier

Two-stage waterfall pipeline for classifying jewelry photos as real customer submissions vs. catalog/screenshot/AI-generated images.

Architecture

Stage Model Resolution Task Parameters
A ConvNeXt-Base (convnext_base.fb_in22k_ft_in1k) 384x384 Jewelry vs Not Jewelry 87.6M
B DeiT-Small (deit_small_patch16_224) 512x512 Real vs Not Real 22.0M

Both models are ImageNet-pretrained and fine-tuned on proprietary jewelry photo data.

Decision Flow

Image -> Stage A (jewelry?) -> p(jewelry) >= 0.88 -> Stage B (real?)
                             -> p(jewelry) <= 0.12 -> NOT_JEWELRY
                             -> otherwise          -> NEEDS_REVIEW

Stage B -> p(real) >= 0.71 -> JEWELRY_REAL
        -> p(real) <= 0.30 -> JEWELRY_NOT_REAL
        -> otherwise       -> NEEDS_REVIEW

Temperature scaling is applied before softmax (Stage A: T=1.502, Stage B: T=1.397).

Performance (4,406 test images)

Metric Value
Stage A jewelry recall 99.78%
Stage B real precision 95.1%
Stage B real recall 93.7%
Total review rate 8.1%

Files

  • stageA_convnext_b_best.pt โ€” Stage A checkpoint (state_dict)
  • stageB_deit_s_clean_best.pt โ€” Stage B checkpoint (state_dict)
  • thresholds.json โ€” Threshold/temperature configuration

Usage

from huggingface_hub import hf_hub_download
import timm, torch, json
from PIL import Image
from torchvision import transforms

# Download files
config = hf_hub_download("Valdos33/jewelry-photo-classifier", "thresholds.json")
ckpt_a = hf_hub_download("Valdos33/jewelry-photo-classifier", "stageA_convnext_b_best.pt")
ckpt_b = hf_hub_download("Valdos33/jewelry-photo-classifier", "stageB_deit_s_clean_best.pt")

Built by BriteCo.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Valdos33/jewelry-photo-classifier 1