Spaces:

esandorfi
/

photo-classification

Sleeping

App Files Files Community

photo-classification / README.md

esandorfi

Domain features first reorganisation

68f48a7 3 months ago

preview code

raw

history blame contribute delete

7.66 kB

metadata

title: Photo Classification API
emoji: 📷
colorFrom: blue
colorTo: gray
sdk: docker
app_port: 7860

Photo Classification API

A small, prompt-driven photo classification API built on CLIP. You upload a label set (domains + labels with prompts), then classify images against that taxonomy without any fine-tuning.

Why this exists

Fast taxonomy iteration: change labels or prompts without retraining.
Simple deployment: CPU-only CLIP inference by default.
Clear outputs: domain hits, chosen domains, and label hits with scores.

Models

Text and image embeddings are produced by CLIP (openai/clip-vit-base-patch32).
All inference runs on CPU unless you adapt ClipStore to use GPU.
Scoring uses cosine-like similarity with CLIP's logit scale and a softmax.

Label datasets

Label sets are JSON files with domains and labels_by_domain, each item defining:

id: stable identifier
display: human-readable name
prompt: the text prompt used for CLIP embedding

Examples:

label-dataset/personal-photos-lite-v1.json
label-dataset/personal-photos-large-v1.json
label-dataset/scene-dance-formation-group-v1.json

Use these as starting points or create your own taxonomy.

API quickstart

Start the service (Docker or uvicorn api.app:app).
Upload a label set.
Optionally activate a label set.
Classify images.

Endpoints:

POST /api/v1/label-sets (upload)
GET /api/v1/label-sets (list)
POST /api/v1/label-sets/{label_set_hash}/activate (set default)
POST /api/v1/classify (classify image)

/api/v1/classify body:

{
  "image_base64": "<base64>",
  "domain_top_n": 2,
  "top_k": 5
}

Response includes domain_hits, chosen_domains, label_hits, and timings.

Recommended runtime config

Default params for stable results:

{
  "domain_top_n": 1,
  "top_k": 3
}

Guard policy example:

Use domain_top_n = 3, top_k = 4.
If non_dance appears in chosen_domains and its top label score is high vs dance, treat the photo as "not dancing" and skip dance-style conclusions.

Architecture

Domain-first layout (URL-centric):

label_sets/: label set API + schemas + registry + hash.
classify/: classify API + schemas + two-stage classifier + results + banks.
model/: CLIP store and embedding encoding.
common/: settings, logging, deps, image IO, middleware.
ui/: splash + page templates.

Coding rules (deeper)

Separation of concerns:

API layer does IO only: parse input, validate, load image, call use-case, return typed DTOs.
Use-case layer owns business logic: the two-stage classification and result shaping.
Model layer owns ML specifics: CLIP loading, text/image encoding, and logit scaling.
Data layer owns taxonomy inputs: JSON label sets, hashing, and embedding banks.

API vs use-case vs torch:

API should not import torch or transformers; it deals in base64, Pydantic models, and HTTP.
Use-case should not depend on HTTP; it accepts a bank + image and returns a typed result.
Torch code lives in ClipStore only; the rest of the code treats embeddings as opaque tensors.

Typed outputs and clean steps:

Each step returns a typed value (LabelSetBank, ClassificationResult, Hit lists).
Keep operations in small pure functions or methods that express a single step:
- validate input
- load/normalize image
- encode image
- score domains
- merge label banks
- score labels

Dataset usage:

Label sets are data, not code. Changes to taxonomy are done in JSON files or user payloads.
Stable hashes (label_set_hash) are derived from canonical JSON for reproducibility.

Avoid old PyTorch habits:

No training loops, optimizers, or manual grad handling; this is inference-only.
Use torch.no_grad() and normalized embeddings for stable cosine-like comparisons.
Keep tensors on the same device; ClipStore owns device placement.
Prefer small, readable tensor ops over complex pipelines.

Error handling and HTTP boundaries:

Decode/validate base64 and size limits in image_io, not inside ML code.
Convert internal errors to HTTP responses at the boundary (e.g., 400/404/413).
Log JSON events with request IDs for traceability.

Tests

Fast, deterministic tests use fakes for classifier and store (tests/fakes.py).
Integration test optionally loads real CLIP (tests/test_integration_real_clip.py).
Run:
- pytest -q
- pytest -q -m integration

Eval scripts

Use the lightweight evaluator via photo-eval to run a label set against local images and capture timings:

uv run photo-eval single \
  --label-set label-dataset/personal-photos-lite-v1.json \
  --images /path/to/images \
  --out-dir data_results \
  --summary

Output CSV files are timestamped (UTC) in data_results/.

Run evals against a remote Space by setting --api:

uv run photo-eval single \
  --api https://esandorfi-photoclassification.hf.space \
  --label-set label-dataset/personal-photos-lite-v1.json \
  --images /path/to/images \
  --summary

Makefile shortcuts:

make eval-photo
make eval-dance
make eval-photo-matrix
make eval-dance-matrix

See src/eval/README.md for the eval CLI reference and API endpoints.

Matrix eval (multiple label sets against the same images):

uv run photo-eval matrix \
  --label-sets "label-dataset/personal-photos-*.json" \
  --images data_eval/photos/normalized \
  --out-dir data_results \
  --summary

Eval datasets (download schema)

We use a simple, reproducible layout for evaluation datasets created by photo-eval prep:

data_eval/
  photos/
    raw/         # downloaded originals
    normalized/  # normalized JPEGs
  dance/
    raw/         # downloaded originals
    normalized/  # normalized JPEGs

Download and normalize (recommended):

uv run photo-eval prep --out data_eval --target photos --n 50 --normalize
uv run photo-eval prep --out data_eval --target dance --n 50 --normalize

Reset existing files and start fresh:

uv run photo-eval prep --out data_eval --target photos --n 50 --normalize --reset

Normalize your own folder into the same schema:

uv run photo-eval prep --normalize-only --in-dir /path/to/images --out data_eval/photos

Project layout

Note: root app.py is a lightweight HF Spaces placeholder that imports api.app:app.

.
├── Dockerfile
├── app.py
├── requirements.txt
└── src
    ├── api
    │   ├── app.py
    │   ├── app_factory.py
    │   ├── common
    │   │   ├── deps.py
    │   │   ├── image_io.py
    │   │   ├── logging.py
    │   │   ├── middleware.py
    │   │   └── settings.py
    │   ├── classify
    │   │   ├── banks.py
    │   │   ├── results.py
    │   │   ├── router.py
    │   │   ├── schemas.py
    │   │   └── service.py
    │   ├── label_sets
    │   │   ├── hash.py
    │   │   ├── registry.py
    │   │   ├── router.py
    │   │   └── schemas.py
    │   ├── model
    │   │   └── clip_store.py
    │   └── ui
    │       ├── page-banner.html
    │       ├── page.html
    │       └── splash.html
    └── eval
        ├── README.md
        ├── cli.py
        ├── classify_dataset.py
        ├── common.py
        ├── dataset_prep.py
        └── eval_matrix.py

Credits

Emmanuel Sandorfi / Knowledge at Lighton

01.2026