Remove _reproducer; scripts will live in PR description instead

Browse files

Files changed (3) hide show

fashion-clip-vit-b-p32/_reproducer/README.md +0 -97
fashion-clip-vit-b-p32/_reproducer/reproduce.sh +0 -300
fashion-clip-vit-b-p32/_reproducer/verify_onnx.py +0 -199

fashion-clip-vit-b-p32/_reproducer/README.md DELETED Viewed

@@ -1,97 +0,0 @@
-# FashionCLIP ONNX export for Typesense
-Builds an ONNX-format copy of [patrickjohncyh/fashion-clip](https://huggingface.co/patrickjohncyh/fashion-clip) packaged in the layout Typesense expects, and verifies it locally before uploading to [typesense/models-moved](https://huggingface.co/typesense/models-moved).
-FashionCLIP is a CLIP-ViT-B/32 fine-tuned on Farfetch fashion images. Drop-in replacement for `ts/clip-vit-b-p32` when the corpus is fashion/apparel.
-## Why this exists
-Typesense ships first-party support for OpenAI CLIP ViT-B/32. FashionCLIP uses the identical architecture but is fine-tuned on fashion data, which materially improves retrieval on apparel datasets. Hosting it on `typesense/models-moved` lets Cloud users select it with `model_name: "ts/fashion-clip-vit-b-p32"`.
-Because the architecture matches CLIP ViT-B/32 exactly, the `vocab.txt` (CLIP BPE merges) and `clip_image_processor.onnx` files are reused byte-for-byte from `clip-vit-b-p32`. Only `model.onnx` is new.
-## Files in this directory
-| File | Purpose |
-|---|---|
-| `verify_onnx.py` | Exports nothing - just runs the staged `model.onnx` and `clip_image_processor.onnx` through onnxruntime, compares against HF transformers reference (cosine should be 1.0), and validates Typesense's exact I/O contract (input/output tensor names, shapes, independence of text/image towers). |
-| `export.py` | Re-runnable export script: downloads FashionCLIP, exports `model.onnx` via Optimum, pulls reusable artifacts from the existing Typesense CLIP repo, writes `config.json` with current MD5s. |
-| `reproduce.sh` | Full local Docker test: mounts the staged model into a Typesense 29.0 container, creates a `fashion-items` collection with an image embedding field, indexes 5 fashion images, runs text>image and image>image searches. |
-| `out-staging/fashion-clip-vit-b-p32/` | The exact 5 files to upload to HuggingFace under `typesense/models-moved/fashion-clip-vit-b-p32/`. |
-| `fashion-clip-src/` | Cache of the upstream HF snapshot. Gitignored. |
-## What ships in the HF bundle
-```
-fashion-clip-vit-b-p32/
-├── model.onnx                 # FashionCLIP weights (~578 MB)
-├── clip_tokenizer.onnx        # Reused from clip-vit-b-p32 (1.3 MB)
-├── clip_image_processor.onnx  # Reused from clip-vit-b-p32 (~4 KB)
-├── vocab.txt                  # CLIP BPE merges, reused (3.0 MB)
-└── config.json                # Typesense metadata + MD5 checksums
-```
-The HF README at `typesense/models-moved/fashion-clip-vit-b-p32/README.md` should credit `patrickjohncyh/fashion-clip` upstream.
-## Verification status
-Run from this directory:
-```powershell
-# 1. ONNX runtime sanity (no Docker required)
-.venv/Scripts/python.exe verify_onnx.py
-# 2. Live Typesense Docker test (requires working Docker)
-bash reproduce.sh
-```
-**ONNX sanity (verify_onnx.py)** confirms:
-- `model.onnx` has the inputs Typesense uses (`input_ids`, `pixel_values`, `attention_mask`)
-- `model.onnx` outputs `text_embeds` and `image_embeds` at `[B, 512]`
-- The first 2D output Typesense's auto-detect would pick is `text_embeds` (the dynamic-dim `logits_per_*` outputs get skipped, matching `clip-vit-b-p32` behavior)
-- `clip_image_processor.onnx` accepts raw image bytes and returns `(1, 3, 224, 224)`
-- ONNX outputs match the HF transformers reference at cosine = 1.0 for both text and image
-- The text tower is independent of `pixel_values` (Typesense's text path uses dummy 0.5 pixels)
-- The image tower is independent of `input_ids` (Typesense's image path uses dummy `[[0]]` ids)
-**Live Docker test (reproduce.sh)** confirms:
-- Typesense loads `fashion-clip-vit-b-p32` from the local models dir without errors
-- The CLIP tokenizer + image processor work end to end
-- Top result for `"a red dress"` is the red cocktail dress
-- Top result for `"comfortable shoes"` is the white sneakers
-- Image>image with `red-dress.jpg` puts the red dress first
-## Upload to HuggingFace (via PR)
-We don't have direct write access to `typesense/models-moved`, so we open a community PR. The `hf` CLI (huggingface_hub >= 0.34) is the supported tool.
-```bash
-hf auth login   # paste a token from https://huggingface.co/settings/tokens
-hf upload typesense/models-moved \
-  ./out-staging/fashion-clip-vit-b-p32 \
-  fashion-clip-vit-b-p32 \
-  --repo-type model \
-  --create-pr \
-  --commit-message "Add fashion-clip-vit-b-p32 (ONNX of patrickjohncyh/fashion-clip)" \
-  --commit-description "FashionCLIP fine-tuned on Farfetch; CLIP-ViT-B/32 architecture so vocab.txt and clip_image_processor.onnx are reused byte-for-byte from clip-vit-b-p32."
-```
-The PR appears under https://huggingface.co/typesense/models-moved/discussions . A repo maintainer reviews and merges. After merge, Typesense Cloud users reference it as `model_name: "ts/fashion-clip-vit-b-p32"`.
-After upload, Typesense Cloud users reference it as:
-```json
-{
-  "name": "embedding",
-  "type": "float[]",
-  "embed": {
-    "from": ["image"],
-    "model_config": {"model_name": "ts/fashion-clip-vit-b-p32"}
-  }
-}
-```
-## License notes
-- FashionCLIP is released under MIT (per the upstream repo card). The ONNX export inherits that license. The HF README on `typesense/models-moved` should preserve the citation:
-  > Chia, P.J., Attanasio, G., Bianchi, F., Terragni, S., Magalhães, A.R., Goncalves, D., Greco, C., Tagliabue, J. *Contrastive language and vision learning of general fashion concepts*, Scientific Reports 12, 18958 (2022).

fashion-clip-vit-b-p32/_reproducer/reproduce.sh DELETED Viewed

@@ -1,300 +0,0 @@
-#!/bin/bash
-# Reproducer: verify a self exported FashionCLIP ONNX model works end to end in
-# a local Typesense container before pushing it to typesense/models-moved.
-#
-# What this script does:
-#   1. Spins up a Typesense Docker container on port 8108.
-#   2. Mounts ./out staging/fashion clip vit b p32/ into the container's models dir
-#      so Typesense treats it as a LOCAL model (no namespace prefix, no download).
-#   3. Creates a collection with an image embedding field that points at the model.
-#   4. Indexes a tiny set of fashion images.
-#   5. Runs a text>image search ("a red dress") and an image>image search to
-#      confirm the model returns sensible top results.
-#
-# Notes:
-#   - This validates the same load path that Typesense uses for ts/clip vit b p32.
-#   - Once the files are uploaded to https://huggingface.co/typesense/models moved
-#     under fashion clip vit b p32/, Typesense Cloud users can reference it as
-#     model_name: "ts/fashion clip vit b p32". The local mode test here is
-#     functionally equivalent: same code path, same config, same I/O.
-set -e
-# On Git Bash / MSYS2 (Windows), `/data` gets auto-translated to
-# `C:/Program Files/Git/data` which breaks Docker volume mounts and CLI args.
-# Disable that translation for the whole script.
-export MSYS_NO_PATHCONV=1
-export MSYS2_ARG_CONV_EXCL='*'
-### Configuration ##############################################################
-TYPESENSE_API_KEY=xyz
-PORT=8108
-TYPESENSE_HOST=http://localhost:${PORT}
-CONTAINER_NAME=typesense-fashion-clip-onnx
-TYPESENSE_VERSION=29.0
-SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
-STAGING_DIR="${SCRIPT_DIR}/out-staging/fashion-clip-vit-b-p32"
-DATA_DIR="${SCRIPT_DIR}/typesense-data-${CONTAINER_NAME}"
-if [ ! -f "${STAGING_DIR}/model.onnx" ]; then
-  echo "ERROR: staging dir missing model.onnx; run the export step first (verify_onnx.py)"
-  exit 1
-fi
-# Python image helper below uses this.
-export SCRIPT_DIR
-### Cleanup ####################################################################
-cleanup() {
-  echo ""
-  echo "=== Cleanup ==="
-  docker stop ${CONTAINER_NAME} 2>/dev/null || true
-  docker rm ${CONTAINER_NAME} 2>/dev/null || true
-  # Container ran as root and wrote into the mounted data dir; on Windows the
-  # host user can't `rm` those files. Use a throwaway root container to wipe
-  # the dir contents from inside the same UID, then drop the now empty dir.
-  if [ -d "${DATA_DIR}" ]; then
-    if command -v cygpath > /dev/null 2>&1; then
-      _w=$(cygpath -w "${DATA_DIR}")
-    else
-      _w="${DATA_DIR}"
-    fi
-    docker run --rm -v "${_w}:/data" alpine:3 sh -c "rm -rf /data/* /data/.??*" 2>/dev/null || true
-    rmdir "${DATA_DIR}" 2>/dev/null || true
-  fi
-  echo "Cleanup complete"
-}
-trap cleanup EXIT
-### Setup Typesense ############################################################
-echo "=== Setting up Typesense ${TYPESENSE_VERSION} ==="
-docker stop ${CONTAINER_NAME} 2>/dev/null || true
-docker rm ${CONTAINER_NAME} 2>/dev/null || true
-mkdir -p "${DATA_DIR}/models"
-# Place the model files under the Typesense data dir at models/fashion-clip-vit-b-p32/
-# (no ts_ prefix => loaded as a local model rather than a public download).
-cp -R "${STAGING_DIR}" "${DATA_DIR}/models/"
-# Docker on Windows wants a Windows-style host path for -v mounts.
-if command -v cygpath > /dev/null 2>&1; then
-  WIN_DATA_DIR=$(cygpath -w "${DATA_DIR}")
-else
-  WIN_DATA_DIR="${DATA_DIR}"
-fi
-docker run -d \
-  --name ${CONTAINER_NAME} \
-  -p ${PORT}:8108 \
-  -v "${WIN_DATA_DIR}:/data" \
-  typesense/typesense:${TYPESENSE_VERSION} \
-  --data-dir=/data \
-  --api-key=${TYPESENSE_API_KEY} \
-  --enable-cors
-echo "Waiting for Typesense to be ready..."
-# /health returns {"ok": true} only once Raft has elected a leader. Poll for the
-# JSON body rather than just exit code so we don't race with leader election.
-for _ in $(seq 1 60); do
-  body=$(curl -s "${TYPESENSE_HOST}/health" 2>/dev/null || true)
-  if echo "$body" | grep -q '"ok":true'; then break; fi
-  sleep 1
-done
-# Belt and suspenders: also confirm /debug returns ok=true (means Raft is ready
-# enough to accept writes). Up to v29, /health alone can lie under fast restart.
-for _ in $(seq 1 30); do
-  debug_body=$(curl -s -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" "${TYPESENSE_HOST}/debug" 2>/dev/null || true)
-  if echo "$debug_body" | grep -q '"state":1'; then break; fi
-  sleep 1
-done
-echo "Typesense is ready"
-wait_for_collection() {
-  local collection=$1
-  local max_wait=${2:-30}
-  local count=0
-  while [ $count -lt $max_wait ]; do
-    if curl -s -o /dev/null -w "%{http_code}" "${TYPESENSE_HOST}/collections/${collection}" \
-      -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" 2>/dev/null | grep -q "200"; then
-      return 0
-    fi
-    sleep 1
-    count=$((count + 1))
-  done
-  echo "WARNING: Collection '${collection}' not ready after ${max_wait}s"
-  return 1
-}
-### Create collection ##########################################################
-echo ""
-echo "=== Creating fashion-items collection ==="
-CREATE_RESP=$(curl -s "${TYPESENSE_HOST}/collections" \
-  -X POST \
-  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "name": "fashion-items",
-    "fields": [
-      {"name": "title",  "type": "string"},
-      {"name": "image",  "type": "image", "store": false},
-      {
-        "name": "embedding",
-        "type": "float[]",
-        "embed": {
-          "from": ["image"],
-          "model_config": {"model_name": "fashion-clip-vit-b-p32"}
-        }
-      }
-    ]
-  }')
-echo "create response: $CREATE_RESP"
-if ! echo "$CREATE_RESP" | jq -e '.name' > /dev/null; then
-  echo ""
-  echo "Collection creation FAILED. Recent Typesense logs:"
-  docker logs --tail 80 ${CONTAINER_NAME}
-  exit 1
-fi
-wait_for_collection "fashion-items"
-### Index a small fashion dataset ##############################################
-echo ""
-echo "=== Preparing sample fashion images ==="
-mkdir -p "${SCRIPT_DIR}/sample-images"
-# Python on Windows can't resolve /c/Users style paths, convert to C:\... form.
-if command -v cygpath > /dev/null 2>&1; then
-  WIN_SCRIPT_DIR=$(cygpath -w "${SCRIPT_DIR}")
-else
-  WIN_SCRIPT_DIR="${SCRIPT_DIR}"
-fi
-export SCRIPT_DIR="${WIN_SCRIPT_DIR}"
-# Use Pillow in the project venv to fetch real photos with a sane User-Agent,
-# falling back to solid-color placeholders that still exercise the embedder.
-PY="${SCRIPT_DIR}/.venv/Scripts/python.exe"
-"$PY" - <<'PYEOF'
-import os, urllib.request
-from PIL import Image, ImageDraw
-OUT = os.path.join(os.environ['SCRIPT_DIR'], 'sample-images')
-os.makedirs(OUT, exist_ok=True)
-ITEMS = [
-    ('red-dress',        (200,  20,  40), 'https://images.pexels.com/photos/985635/pexels-photo-985635.jpeg?w=400'),
-    ('blue-jeans',       ( 30,  60, 140), 'https://images.pexels.com/photos/52518/jeans-pants-blue-pocket-52518.jpeg?w=400'),
-    ('white-sneakers',   (240, 240, 240), 'https://images.pexels.com/photos/1102776/pexels-photo-1102776.jpeg?w=400'),
-    ('leather-handbag',  (110,  60,  30), 'https://images.pexels.com/photos/1152077/pexels-photo-1152077.jpeg?w=400'),
-    ('wool-sweater',     (180, 140,  90), 'https://images.pexels.com/photos/1721934/pexels-photo-1721934.jpeg?w=400'),
-]
-for name, color, url in ITEMS:
-    path = os.path.join(OUT, name + '.jpg')
-    if os.path.exists(path) and os.path.getsize(path) > 1024:
-        continue
-    try:
-        req = urllib.request.Request(url, headers={'User-Agent':'Mozilla/5.0'})
-        with urllib.request.urlopen(req, timeout=20) as r:
-            data = r.read()
-        with open(path,'wb') as f:
-            f.write(data)
-        print(f'  fetched {name}: {len(data)} bytes')
-    except Exception as e:
-        # placeholder
-        img = Image.new('RGB', (400, 400), color)
-        d = ImageDraw.Draw(img)
-        d.text((10, 10), name, fill=(255,255,255) if sum(color)<400 else (0,0,0))
-        img.save(path, 'JPEG', quality=85)
-        print(f'  placeholder {name} (fetch failed: {e})')
-PYEOF
-echo ""
-echo "=== Indexing fashion items ==="
-build_doc() {
-  local id=$1 title=$2 file=$3
-  local b64
-  b64=$(base64 -w0 "${SCRIPT_DIR}/sample-images/${file}.jpg")
-  printf '{"id":"%s","title":"%s","image":"%s"}\n' "$id" "$title" "$b64"
-}
-{
-  build_doc 1 "Red cocktail dress"        red-dress
-  build_doc 2 "Blue denim jeans"          blue-jeans
-  build_doc 3 "White leather sneakers"    white-sneakers
-  build_doc 4 "Brown leather handbag"     leather-handbag
-  build_doc 5 "Knit wool sweater"         wool-sweater
-} | curl -s "${TYPESENSE_HOST}/collections/fashion-items/documents/import?action=create" \
-    -X POST \
-    -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-    -H "Content-Type: text/plain" \
-    --data-binary @-
-echo ""
-### Text>image search ##########################################################
-echo "=== Text>image search: 'a red dress' ==="
-curl -s "${TYPESENSE_HOST}/multi_search" \
-  -X POST \
-  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-  -d '{
-    "searches": [
-      {
-        "collection": "fashion-items",
-        "q": "a red dress",
-        "query_by": "embedding",
-        "prefix": false,
-        "include_fields": "id,title",
-        "per_page": 3
-      }
-    ]
-  }' | jq '.results[0].hits[] | {id: .document.id, title: .document.title, distance: .vector_distance}'
-echo ""
-echo "=== Text>image search: 'comfortable shoes' ==="
-curl -s "${TYPESENSE_HOST}/multi_search" \
-  -X POST \
-  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-  -d '{
-    "searches": [
-      {
-        "collection": "fashion-items",
-        "q": "comfortable shoes",
-        "query_by": "embedding",
-        "prefix": false,
-        "include_fields": "id,title",
-        "per_page": 3
-      }
-    ]
-  }' | jq '.results[0].hits[] | {id: .document.id, title: .document.title, distance: .vector_distance}'
-### Image>image search #########################################################
-echo ""
-echo "=== Image>image search using red-dress.jpg ==="
-QUERY_IMG_B64=$(base64 -w0 "${SCRIPT_DIR}/sample-images/red-dress.jpg")
-curl -s "${TYPESENSE_HOST}/multi_search" \
-  -X POST \
-  -H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-  -H "Content-Type: application/json" \
-  -d "$(jq -n --arg img "$QUERY_IMG_B64" '{
-    searches: [{
-      collection: "fashion-items",
-      q: "*",
-      vector_query: ("embedding:([], queries: [\"" + $img + "\"], k: 3)"),
-      include_fields: "id,title"
-    }]
-  }')" | jq '.results[0].hits[] | {id: .document.id, title: .document.title, distance: .vector_distance}'
-### Container logs #############################################################
-echo ""
-echo "=== Typesense Logs (last 40 lines) ==="
-docker logs --tail 40 ${CONTAINER_NAME}
-echo ""
-echo "=== VERIFICATION ==="
-echo "Expect text query 'a red dress' to put id=1 (Red cocktail dress) first."
-echo "Expect text query 'comfortable shoes' to put id=3 (White leather sneakers) first."
-echo "Expect image>image query with red-dress.jpg to put id=1 first."

fashion-clip-vit-b-p32/_reproducer/verify_onnx.py DELETED Viewed

@@ -1,199 +0,0 @@
-"""
-End-to-end sanity test for the FashionCLIP ONNX export.
-Replicates the exact I/O contract Typesense uses:
-- model.onnx text path: input_ids (int64) + pixel_values (float32 [1,3,224,224] dummy 0.5) + attention_mask
-                        -> read 2D [-1, 512] output (text_embeds)
-- model.onnx image path: input_ids dummy + pixel_values + attention_mask
-                         -> read image_embeds [B, 512]
-- clip_image_processor.onnx: image bytes (uint8 [N]) -> last_hidden_state [1,3,224,224]
-Also cross-checks ONNX outputs against the original HF transformers FashionCLIP forward pass
-to confirm the export preserves semantics (cosine similarity > 0.999).
-"""
-import io
-import os
-import sys
-import hashlib
-import numpy as np
-import onnxruntime as ort
-from PIL import Image
-import requests
-from transformers import CLIPModel, CLIPProcessor
-import torch
-STAGING = os.path.join(os.path.dirname(__file__), "out-staging", "fashion-clip-vit-b-p32")
-SRC = os.path.join(os.path.dirname(__file__), "fashion-clip-src")
-MODEL_ONNX = os.path.join(STAGING, "model.onnx")
-PROCESSOR_ONNX = os.path.join(STAGING, "clip_image_processor.onnx")
-SAMPLE_IMAGE_URL = "http://images.cocodataset.org/val2017/000000039769.jpg"  # two cats
-SAMPLE_TEXTS = ["a photo of a cat", "a red dress", "blue denim jeans"]
-def fail(msg):
-    print(f"FAIL: {msg}")
-    sys.exit(1)
-def passed(msg):
-    print(f"OK  : {msg}")
-def get_sample_image_bytes():
-    cache = os.path.join(os.path.dirname(__file__), "sample.jpg")
-    if not os.path.exists(cache):
-        r = requests.get(SAMPLE_IMAGE_URL, timeout=30)
-        r.raise_for_status()
-        with open(cache, "wb") as f:
-            f.write(r.content)
-    return open(cache, "rb").read()
-def main():
-    print("=== Inspecting model.onnx ===")
-    sess = ort.InferenceSession(MODEL_ONNX, providers=["CPUExecutionProvider"])
-    in_names = [i.name for i in sess.get_inputs()]
-    out_names = [o.name for o in sess.get_outputs()]
-    print("inputs :", in_names)
-    print("outputs:", out_names)
-    assert "input_ids" in in_names, "missing input_ids"
-    assert "pixel_values" in in_names, "missing pixel_values"
-    assert "attention_mask" in in_names, "missing attention_mask"
-    assert "text_embeds" in out_names, "missing text_embeds"
-    assert "image_embeds" in out_names, "missing image_embeds"
-    # Typesense scans outputs and picks the first 2D [-1, N>0] one as the embedding output.
-    # logits_per_* have both dims dynamic, so they get skipped; text_embeds should be the pick.
-    typesense_chosen = None
-    for o in sess.get_outputs():
-        shp = o.shape
-        if len(shp) == 2 and shp[0] in (None, "text_batch_size", "image_batch_size", -1) and isinstance(shp[1], int) and shp[1] > 0:
-            typesense_chosen = (o.name, shp)
-            break
-    print("typesense would pick:", typesense_chosen)
-    if typesense_chosen[0] != "text_embeds":
-        fail(f"expected text_embeds to be the first 2D embedding output, got {typesense_chosen}")
-    passed("model.onnx I/O matches Typesense expectations")
-    print()
-    print("=== Inspecting clip_image_processor.onnx ===")
-    from onnxruntime_extensions import get_library_path
-    proc_opts = ort.SessionOptions()
-    proc_opts.register_custom_ops_library(get_library_path())
-    proc_sess = ort.InferenceSession(PROCESSOR_ONNX, sess_options=proc_opts, providers=["CPUExecutionProvider"])
-    pin = [(i.name, i.type, i.shape) for i in proc_sess.get_inputs()]
-    pout = [(o.name, o.type, o.shape) for o in proc_sess.get_outputs()]
-    print("processor inputs :", pin)
-    print("processor outputs:", pout)
-    assert pin[0][0] == "image", "processor expects input named 'image'"
-    assert pout[0][0] == "last_hidden_state", "processor expects output 'last_hidden_state'"
-    passed("clip_image_processor.onnx I/O matches Typesense expectations")
-    print()
-    print("=== Running image processor on a real image ===")
-    img_bytes = get_sample_image_bytes()
-    img_arr = np.frombuffer(img_bytes, dtype=np.uint8)
-    proc_out = proc_sess.run(["last_hidden_state"], {"image": img_arr})[0]
-    print("processor output shape:", proc_out.shape, "dtype:", proc_out.dtype)
-    if proc_out.shape != (1, 3, 224, 224):
-        fail(f"expected (1,3,224,224), got {proc_out.shape}")
-    passed("image processor returns (1,3,224,224)")
-    print()
-    print("=== ONNX vs HF transformers parity check ===")
-    # Build reference using the original transformers model.
-    hf_model = CLIPModel.from_pretrained(SRC).eval()
-    hf_processor = CLIPProcessor.from_pretrained(SRC)
-    pil = Image.open(io.BytesIO(img_bytes)).convert("RGB")
-    hf_inputs = hf_processor(text=SAMPLE_TEXTS, images=pil, return_tensors="pt", padding=True)
-    with torch.no_grad():
-        hf_out = hf_model(**hf_inputs)
-    hf_text = hf_out.text_embeds.numpy()
-    hf_image = hf_out.image_embeds.numpy()
-    # Now run the same inputs through ONNX model.onnx.
-    # Note: ONNX export needs the same pixel_values as HF.
-    pixel_values = hf_inputs["pixel_values"].numpy()
-    input_ids = hf_inputs["input_ids"].numpy().astype(np.int64)
-    attention_mask = hf_inputs["attention_mask"].numpy().astype(np.int64)
-    onnx_out = sess.run(
-        ["text_embeds", "image_embeds"],
-        {
-            "input_ids": input_ids,
-            "pixel_values": pixel_values,
-            "attention_mask": attention_mask,
-        },
-    )
-    onnx_text, onnx_image = onnx_out
-    def cosine(a, b):
-        return float(np.sum(a * b) / (np.linalg.norm(a) * np.linalg.norm(b)))
-    for i, txt in enumerate(SAMPLE_TEXTS):
-        c = cosine(hf_text[i], onnx_text[i])
-        print(f"  text[{i}] '{txt}': cosine(HF, ONNX) = {c:.6f}")
-        if c < 0.999:
-            fail(f"text_embeds parity too low: {c}")
-    c_img = cosine(hf_image[0], onnx_image[0])
-    print(f"  image: cosine(HF, ONNX) = {c_img:.6f}")
-    if c_img < 0.999:
-        fail(f"image_embeds parity too low: {c_img}")
-    passed("ONNX text+image embeddings match HF reference (cosine > 0.999)")
-    print()
-    print("=== Typesense-style text path (dummy pixel_values = 0.5) ===")
-    # Typesense's text_embedder fills pixel_values with 0.5 for text-only queries.
-    # The text_embeds output should not depend on pixel_values (the towers are independent),
-    # so this should match the real-image text embedding.
-    dummy_pixels = np.full((1, 3, 224, 224), 0.5, dtype=np.float32)
-    typesense_text = sess.run(
-        ["text_embeds"],
-        {
-            "input_ids": input_ids[:1],
-            "attention_mask": attention_mask[:1],
-            "pixel_values": dummy_pixels,
-        },
-    )[0]
-    c = cosine(typesense_text[0], onnx_text[0])
-    print(f"  typesense text path vs onnx text path: cosine = {c:.6f}")
-    if c < 0.9999:
-        fail(f"text path differs when pixel_values change - towers shouldn't be coupled: {c}")
-    passed("text_embeds is independent of pixel_values (Typesense text path safe)")
-    print()
-    print("=== Typesense-style image embed path (dummy input_ids) ===")
-    # image_embedder.cpp passes input_ids shape [1,1] = [[0]] alongside pixel_values.
-    dummy_ids = np.array([[0]], dtype=np.int64)
-    dummy_mask = np.array([[1]], dtype=np.int64)
-    typesense_image = sess.run(
-        ["image_embeds"],
-        {
-            "input_ids": dummy_ids,
-            "pixel_values": pixel_values[:1],
-            "attention_mask": dummy_mask,
-        },
-    )[0]
-    c = cosine(typesense_image[0], onnx_image[0])
-    print(f"  typesense image path vs reference: cosine = {c:.6f}")
-    if c < 0.9999:
-        fail(f"image_embeds differs when input_ids change - towers shouldn't be coupled: {c}")
-    passed("image_embeds is independent of input_ids (Typesense image path safe)")
-    print()
-    print("=== Computing MD5 of model.onnx for config.json ===")
-    h = hashlib.md5()
-    with open(MODEL_ONNX, "rb") as f:
-        for chunk in iter(lambda: f.read(1 << 20), b""):
-            h.update(chunk)
-    print("model.onnx md5:", h.hexdigest())
-    print()
-    print("ALL CHECKS PASSED")
-if __name__ == "__main__":
-    main()