Spaces:
Sleeping
title: Photo Classification API
emoji: π·
colorFrom: blue
colorTo: gray
sdk: docker
app_port: 7860
Photo Classification API
A small, prompt-driven photo classification API built on CLIP. You upload a label set (domains + labels with prompts), then classify images against that taxonomy without any fine-tuning.
Why this exists
- Fast taxonomy iteration: change labels or prompts without retraining.
- Simple deployment: CPU-only CLIP inference by default.
- Clear outputs: domain hits, chosen domains, and label hits with scores.
Models
- Text and image embeddings are produced by CLIP (
openai/clip-vit-base-patch32). - All inference runs on CPU unless you adapt
ClipStoreto use GPU. - Scoring uses cosine-like similarity with CLIP's logit scale and a softmax.
Label datasets
Label sets are JSON files with domains and labels_by_domain, each item defining:
id: stable identifierdisplay: human-readable nameprompt: the text prompt used for CLIP embedding
Examples:
label-dataset/personal-photos-lite-v1.jsonlabel-dataset/personal-photos-large-v1.jsonlabel-dataset/scene-dance-formation-group-v1.json
Use these as starting points or create your own taxonomy.
API quickstart
- Start the service (Docker or
uvicorn api.app:app). - Upload a label set.
- Optionally activate a label set.
- Classify images.
Endpoints:
POST /api/v1/label-sets(upload)GET /api/v1/label-sets(list)POST /api/v1/label-sets/{label_set_hash}/activate(set default)POST /api/v1/classify(classify image)
/api/v1/classify body:
{
"image_base64": "<base64>",
"domain_top_n": 2,
"top_k": 5
}
Response includes domain_hits, chosen_domains, label_hits, and timings.
Recommended runtime config
Default params for stable results:
{
"domain_top_n": 1,
"top_k": 3
}
Guard policy example:
- Use
domain_top_n = 3,top_k = 4. - If
non_danceappears inchosen_domainsand its top label score is high vsdance, treat the photo as "not dancing" and skip dance-style conclusions.
Architecture
Domain-first layout (URL-centric):
label_sets/: label set API + schemas + registry + hash.classify/: classify API + schemas + two-stage classifier + results + banks.model/: CLIP store and embedding encoding.common/: settings, logging, deps, image IO, middleware.ui/: splash + page templates.
Coding rules (deeper)
Separation of concerns:
- API layer does IO only: parse input, validate, load image, call use-case, return typed DTOs.
- Use-case layer owns business logic: the two-stage classification and result shaping.
- Model layer owns ML specifics: CLIP loading, text/image encoding, and logit scaling.
- Data layer owns taxonomy inputs: JSON label sets, hashing, and embedding banks.
API vs use-case vs torch:
- API should not import torch or transformers; it deals in base64, Pydantic models, and HTTP.
- Use-case should not depend on HTTP; it accepts a bank + image and returns a typed result.
- Torch code lives in
ClipStoreonly; the rest of the code treats embeddings as opaque tensors.
Typed outputs and clean steps:
- Each step returns a typed value (
LabelSetBank,ClassificationResult,Hitlists). - Keep operations in small pure functions or methods that express a single step:
- validate input
- load/normalize image
- encode image
- score domains
- merge label banks
- score labels
Dataset usage:
- Label sets are data, not code. Changes to taxonomy are done in JSON files or user payloads.
- Stable hashes (
label_set_hash) are derived from canonical JSON for reproducibility.
Avoid old PyTorch habits:
- No training loops, optimizers, or manual grad handling; this is inference-only.
- Use
torch.no_grad()and normalized embeddings for stable cosine-like comparisons. - Keep tensors on the same device;
ClipStoreowns device placement. - Prefer small, readable tensor ops over complex pipelines.
Error handling and HTTP boundaries:
- Decode/validate base64 and size limits in
image_io, not inside ML code. - Convert internal errors to HTTP responses at the boundary (e.g., 400/404/413).
- Log JSON events with request IDs for traceability.
Tests
- Fast, deterministic tests use fakes for classifier and store (
tests/fakes.py). - Integration test optionally loads real CLIP (
tests/test_integration_real_clip.py). - Run:
pytest -qpytest -q -m integration
Eval scripts
Use the lightweight evaluator via photo-eval to run a label set against
local images and capture timings:
uv run photo-eval single \
--label-set label-dataset/personal-photos-lite-v1.json \
--images /path/to/images \
--out-dir data_results \
--summary
Output CSV files are timestamped (UTC) in data_results/.
Run evals against a remote Space by setting --api:
uv run photo-eval single \
--api https://esandorfi-photoclassification.hf.space \
--label-set label-dataset/personal-photos-lite-v1.json \
--images /path/to/images \
--summary
Makefile shortcuts:
make eval-photomake eval-dancemake eval-photo-matrixmake eval-dance-matrix
See src/eval/README.md for the eval CLI reference and API endpoints.
Matrix eval (multiple label sets against the same images):
uv run photo-eval matrix \
--label-sets "label-dataset/personal-photos-*.json" \
--images data_eval/photos/normalized \
--out-dir data_results \
--summary
Eval datasets (download schema)
We use a simple, reproducible layout for evaluation datasets created by photo-eval prep:
data_eval/
photos/
raw/ # downloaded originals
normalized/ # normalized JPEGs
dance/
raw/ # downloaded originals
normalized/ # normalized JPEGs
Download and normalize (recommended):
uv run photo-eval prep --out data_eval --target photos --n 50 --normalize
uv run photo-eval prep --out data_eval --target dance --n 50 --normalize
Reset existing files and start fresh:
uv run photo-eval prep --out data_eval --target photos --n 50 --normalize --reset
Normalize your own folder into the same schema:
uv run photo-eval prep --normalize-only --in-dir /path/to/images --out data_eval/photos
Project layout
Note: root app.py is a lightweight HF Spaces placeholder that imports api.app:app.
.
βββ Dockerfile
βββ app.py
βββ requirements.txt
βββ src
βββ api
β βββ app.py
β βββ app_factory.py
β βββ common
β β βββ deps.py
β β βββ image_io.py
β β βββ logging.py
β β βββ middleware.py
β β βββ settings.py
β βββ classify
β β βββ banks.py
β β βββ results.py
β β βββ router.py
β β βββ schemas.py
β β βββ service.py
β βββ label_sets
β β βββ hash.py
β β βββ registry.py
β β βββ router.py
β β βββ schemas.py
β βββ model
β β βββ clip_store.py
β βββ ui
β βββ page-banner.html
β βββ page.html
β βββ splash.html
βββ eval
βββ README.md
βββ cli.py
βββ classify_dataset.py
βββ common.py
βββ dataset_prep.py
βββ eval_matrix.py
Credits
Emmanuel Sandorfi / Knowledge at Lighton
01.2026