esandorfi's picture
Domain features first reorganisation
68f48a7
metadata
title: Photo Classification API
emoji: πŸ“·
colorFrom: blue
colorTo: gray
sdk: docker
app_port: 7860

Photo Classification API

A small, prompt-driven photo classification API built on CLIP. You upload a label set (domains + labels with prompts), then classify images against that taxonomy without any fine-tuning.

Why this exists

  • Fast taxonomy iteration: change labels or prompts without retraining.
  • Simple deployment: CPU-only CLIP inference by default.
  • Clear outputs: domain hits, chosen domains, and label hits with scores.

Models

  • Text and image embeddings are produced by CLIP (openai/clip-vit-base-patch32).
  • All inference runs on CPU unless you adapt ClipStore to use GPU.
  • Scoring uses cosine-like similarity with CLIP's logit scale and a softmax.

Label datasets

Label sets are JSON files with domains and labels_by_domain, each item defining:

  • id: stable identifier
  • display: human-readable name
  • prompt: the text prompt used for CLIP embedding

Examples:

  • label-dataset/personal-photos-lite-v1.json
  • label-dataset/personal-photos-large-v1.json
  • label-dataset/scene-dance-formation-group-v1.json

Use these as starting points or create your own taxonomy.

API quickstart

  1. Start the service (Docker or uvicorn api.app:app).
  2. Upload a label set.
  3. Optionally activate a label set.
  4. Classify images.

Endpoints:

  • POST /api/v1/label-sets (upload)
  • GET /api/v1/label-sets (list)
  • POST /api/v1/label-sets/{label_set_hash}/activate (set default)
  • POST /api/v1/classify (classify image)

/api/v1/classify body:

{
  "image_base64": "<base64>",
  "domain_top_n": 2,
  "top_k": 5
}

Response includes domain_hits, chosen_domains, label_hits, and timings.

Recommended runtime config

Default params for stable results:

{
  "domain_top_n": 1,
  "top_k": 3
}

Guard policy example:

  • Use domain_top_n = 3, top_k = 4.
  • If non_dance appears in chosen_domains and its top label score is high vs dance, treat the photo as "not dancing" and skip dance-style conclusions.

Architecture

Domain-first layout (URL-centric):

  • label_sets/: label set API + schemas + registry + hash.
  • classify/: classify API + schemas + two-stage classifier + results + banks.
  • model/: CLIP store and embedding encoding.
  • common/: settings, logging, deps, image IO, middleware.
  • ui/: splash + page templates.

Coding rules (deeper)

Separation of concerns:

  • API layer does IO only: parse input, validate, load image, call use-case, return typed DTOs.
  • Use-case layer owns business logic: the two-stage classification and result shaping.
  • Model layer owns ML specifics: CLIP loading, text/image encoding, and logit scaling.
  • Data layer owns taxonomy inputs: JSON label sets, hashing, and embedding banks.

API vs use-case vs torch:

  • API should not import torch or transformers; it deals in base64, Pydantic models, and HTTP.
  • Use-case should not depend on HTTP; it accepts a bank + image and returns a typed result.
  • Torch code lives in ClipStore only; the rest of the code treats embeddings as opaque tensors.

Typed outputs and clean steps:

  • Each step returns a typed value (LabelSetBank, ClassificationResult, Hit lists).
  • Keep operations in small pure functions or methods that express a single step:
    • validate input
    • load/normalize image
    • encode image
    • score domains
    • merge label banks
    • score labels

Dataset usage:

  • Label sets are data, not code. Changes to taxonomy are done in JSON files or user payloads.
  • Stable hashes (label_set_hash) are derived from canonical JSON for reproducibility.

Avoid old PyTorch habits:

  • No training loops, optimizers, or manual grad handling; this is inference-only.
  • Use torch.no_grad() and normalized embeddings for stable cosine-like comparisons.
  • Keep tensors on the same device; ClipStore owns device placement.
  • Prefer small, readable tensor ops over complex pipelines.

Error handling and HTTP boundaries:

  • Decode/validate base64 and size limits in image_io, not inside ML code.
  • Convert internal errors to HTTP responses at the boundary (e.g., 400/404/413).
  • Log JSON events with request IDs for traceability.

Tests

  • Fast, deterministic tests use fakes for classifier and store (tests/fakes.py).
  • Integration test optionally loads real CLIP (tests/test_integration_real_clip.py).
  • Run:
    • pytest -q
    • pytest -q -m integration

Eval scripts

Use the lightweight evaluator via photo-eval to run a label set against local images and capture timings:

uv run photo-eval single \
  --label-set label-dataset/personal-photos-lite-v1.json \
  --images /path/to/images \
  --out-dir data_results \
  --summary

Output CSV files are timestamped (UTC) in data_results/.

Run evals against a remote Space by setting --api:

uv run photo-eval single \
  --api https://esandorfi-photoclassification.hf.space \
  --label-set label-dataset/personal-photos-lite-v1.json \
  --images /path/to/images \
  --summary

Makefile shortcuts:

  • make eval-photo
  • make eval-dance
  • make eval-photo-matrix
  • make eval-dance-matrix

See src/eval/README.md for the eval CLI reference and API endpoints.

Matrix eval (multiple label sets against the same images):

uv run photo-eval matrix \
  --label-sets "label-dataset/personal-photos-*.json" \
  --images data_eval/photos/normalized \
  --out-dir data_results \
  --summary

Eval datasets (download schema)

We use a simple, reproducible layout for evaluation datasets created by photo-eval prep:

data_eval/
  photos/
    raw/         # downloaded originals
    normalized/  # normalized JPEGs
  dance/
    raw/         # downloaded originals
    normalized/  # normalized JPEGs

Download and normalize (recommended):

uv run photo-eval prep --out data_eval --target photos --n 50 --normalize
uv run photo-eval prep --out data_eval --target dance --n 50 --normalize

Reset existing files and start fresh:

uv run photo-eval prep --out data_eval --target photos --n 50 --normalize --reset

Normalize your own folder into the same schema:

uv run photo-eval prep --normalize-only --in-dir /path/to/images --out data_eval/photos

Project layout

Note: root app.py is a lightweight HF Spaces placeholder that imports api.app:app.

.
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
└── src
    β”œβ”€β”€ api
    β”‚   β”œβ”€β”€ app.py
    β”‚   β”œβ”€β”€ app_factory.py
    β”‚   β”œβ”€β”€ common
    β”‚   β”‚   β”œβ”€β”€ deps.py
    β”‚   β”‚   β”œβ”€β”€ image_io.py
    β”‚   β”‚   β”œβ”€β”€ logging.py
    β”‚   β”‚   β”œβ”€β”€ middleware.py
    β”‚   β”‚   └── settings.py
    β”‚   β”œβ”€β”€ classify
    β”‚   β”‚   β”œβ”€β”€ banks.py
    β”‚   β”‚   β”œβ”€β”€ results.py
    β”‚   β”‚   β”œβ”€β”€ router.py
    β”‚   β”‚   β”œβ”€β”€ schemas.py
    β”‚   β”‚   └── service.py
    β”‚   β”œβ”€β”€ label_sets
    β”‚   β”‚   β”œβ”€β”€ hash.py
    β”‚   β”‚   β”œβ”€β”€ registry.py
    β”‚   β”‚   β”œβ”€β”€ router.py
    β”‚   β”‚   └── schemas.py
    β”‚   β”œβ”€β”€ model
    β”‚   β”‚   └── clip_store.py
    β”‚   └── ui
    β”‚       β”œβ”€β”€ page-banner.html
    β”‚       β”œβ”€β”€ page.html
    β”‚       └── splash.html
    └── eval
        β”œβ”€β”€ README.md
        β”œβ”€β”€ cli.py
        β”œβ”€β”€ classify_dataset.py
        β”œβ”€β”€ common.py
        β”œβ”€β”€ dataset_prep.py
        └── eval_matrix.py

Credits

Emmanuel Sandorfi / Knowledge at Lighton

01.2026