Add fashion-clip-vit-b-p32 (ONNX of patrickjohncyh/fashion-clip)

FashionCLIP fine-tuned on the Farfetch fashion catalog. Same CLIP-ViT-B/32 architecture as clip-vit-b-p32, so clip_image_processor.onnx and vocab.txt are reused byte-for-byte. Verified end-to-end against a local Typesense 29.0 container: model loads, text-image and image-image search return correct top results.

Files changed (6) hide show

fashion-clip-vit-b-p32/README.md +57 -0
fashion-clip-vit-b-p32/clip_image_processor.onnx +3 -0
fashion-clip-vit-b-p32/clip_tokenizer.onnx +3 -0
fashion-clip-vit-b-p32/config.json +9 -0
fashion-clip-vit-b-p32/model.onnx +3 -0
fashion-clip-vit-b-p32/vocab.txt +0 -0

fashion-clip-vit-b-p32/README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+license: mit
+tags:
+  - clip
+  - fashion
+  - onnx
+  - typesense
+base_model: patrickjohncyh/fashion-clip
+---
+# fashion-clip-vit-b-p32
+ONNX export of [patrickjohncyh/fashion-clip](https://huggingface.co/patrickjohncyh/fashion-clip) packaged for use with [Typesense](https://typesense.org)'s built-in image search.
+FashionCLIP is a CLIP-ViT-B/32 model fine-tuned by Patrick John Chia et al. on Farfetch's fashion catalog. It produces materially better retrieval than the generic OpenAI CLIP on apparel and fashion product corpora.
+## Usage in Typesense
+```json
+{
+  "name": "embedding",
+  "type": "float[]",
+  "embed": {
+    "from": ["image"],
+    "model_config": {"model_name": "ts/fashion-clip-vit-b-p32"}
+  }
+}
+```
+The architecture is identical to `clip-vit-b-p32`; the `clip_tokenizer.onnx`, `clip_image_processor.onnx`, and `vocab.txt` files are reused byte-for-byte. Only the model weights differ.
+## Files
+| File | Notes |
+|---|---|
+| `model.onnx` | FashionCLIP weights exported via Optimum (`zero-shot-image-classification` task, opset 11). Inputs: `input_ids`, `pixel_values`, `attention_mask`. Outputs include `text_embeds` and `image_embeds`, both `[B, 512]`. |
+| `clip_tokenizer.onnx` | Reused from `clip-vit-b-p32`. |
+| `clip_image_processor.onnx` | Reused from `clip-vit-b-p32`. Uses `onnxruntime-extensions` `DecodeImage` custom op. |
+| `vocab.txt` | CLIP BPE merges. Reused from `clip-vit-b-p32`. |
+| `config.json` | Typesense model metadata with MD5 checksums. |
+## Citation
+```bibtex
+@article{Chia2022FashionCLIP,
+  title    = {Contrastive language and vision learning of general fashion concepts},
+  author   = {Chia, Patrick John and Attanasio, Giuseppe and Bianchi, Federico and Terragni, Silvia and Magalh\~aes, Ana Rita and Goncalves, Diogo and Greco, Ciro and Tagliabue, Jacopo},
+  journal  = {Scientific Reports},
+  volume   = {12},
+  number   = {1},
+  pages    = {18958},
+  year     = {2022},
+  doi      = {10.1038/s41598-022-23052-9}
+}
+```
+License: MIT (inherited from `patrickjohncyh/fashion-clip`).

fashion-clip-vit-b-p32/clip_image_processor.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e81ca63e25a954c12508a99f6a18428acabce41c61dd0da66389ccb0297f51cb
+size 4025

fashion-clip-vit-b-p32/clip_tokenizer.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:188b28e19884e7f72739908d072c7f3f685069780b3c8227fc6ee1204b38e702
+size 1387327

fashion-clip-vit-b-p32/config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "vocab_file_name": "vocab.txt",
+  "vocab_md5": "f2b3f051f999e708058e5ad997d927a3",
+  "model_type": "clip",
+  "model_md5": "f627bc60603567158dd6b606df801c14",
+  "image_processor_md5": "c0083116bec05061fdc372c5bd1e0d19",
+  "image_processor_file_name": "clip_image_processor.onnx",
+  "image_embedder": true
+}

fashion-clip-vit-b-p32/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:374074040582148fd7a0ab2a550d1ec834bfb6f6a258d7aaa04d80383acccca2
+size 605760566

fashion-clip-vit-b-p32/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff