Add fashion-clip-vit-b-p32 (ONNX of patrickjohncyh/fashion-clip)
Browse filesFashionCLIP fine-tuned on the Farfetch fashion catalog. Same CLIP-ViT-B/32 architecture as clip-vit-b-p32, so clip_image_processor.onnx and vocab.txt are reused byte-for-byte. Verified end-to-end against a local Typesense 29.0 container: model loads, text-image and image-image search return correct top results.
fashion-clip-vit-b-p32/README.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- clip
|
| 5 |
+
- fashion
|
| 6 |
+
- onnx
|
| 7 |
+
- typesense
|
| 8 |
+
base_model: patrickjohncyh/fashion-clip
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# fashion-clip-vit-b-p32
|
| 12 |
+
|
| 13 |
+
ONNX export of [patrickjohncyh/fashion-clip](https://huggingface.co/patrickjohncyh/fashion-clip) packaged for use with [Typesense](https://typesense.org)'s built-in image search.
|
| 14 |
+
|
| 15 |
+
FashionCLIP is a CLIP-ViT-B/32 model fine-tuned by Patrick John Chia et al. on Farfetch's fashion catalog. It produces materially better retrieval than the generic OpenAI CLIP on apparel and fashion product corpora.
|
| 16 |
+
|
| 17 |
+
## Usage in Typesense
|
| 18 |
+
|
| 19 |
+
```json
|
| 20 |
+
{
|
| 21 |
+
"name": "embedding",
|
| 22 |
+
"type": "float[]",
|
| 23 |
+
"embed": {
|
| 24 |
+
"from": ["image"],
|
| 25 |
+
"model_config": {"model_name": "ts/fashion-clip-vit-b-p32"}
|
| 26 |
+
}
|
| 27 |
+
}
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
The architecture is identical to `clip-vit-b-p32`; the `clip_tokenizer.onnx`, `clip_image_processor.onnx`, and `vocab.txt` files are reused byte-for-byte. Only the model weights differ.
|
| 31 |
+
|
| 32 |
+
## Files
|
| 33 |
+
|
| 34 |
+
| File | Notes |
|
| 35 |
+
|---|---|
|
| 36 |
+
| `model.onnx` | FashionCLIP weights exported via Optimum (`zero-shot-image-classification` task, opset 11). Inputs: `input_ids`, `pixel_values`, `attention_mask`. Outputs include `text_embeds` and `image_embeds`, both `[B, 512]`. |
|
| 37 |
+
| `clip_tokenizer.onnx` | Reused from `clip-vit-b-p32`. |
|
| 38 |
+
| `clip_image_processor.onnx` | Reused from `clip-vit-b-p32`. Uses `onnxruntime-extensions` `DecodeImage` custom op. |
|
| 39 |
+
| `vocab.txt` | CLIP BPE merges. Reused from `clip-vit-b-p32`. |
|
| 40 |
+
| `config.json` | Typesense model metadata with MD5 checksums. |
|
| 41 |
+
|
| 42 |
+
## Citation
|
| 43 |
+
|
| 44 |
+
```bibtex
|
| 45 |
+
@article{Chia2022FashionCLIP,
|
| 46 |
+
title = {Contrastive language and vision learning of general fashion concepts},
|
| 47 |
+
author = {Chia, Patrick John and Attanasio, Giuseppe and Bianchi, Federico and Terragni, Silvia and Magalh\~aes, Ana Rita and Goncalves, Diogo and Greco, Ciro and Tagliabue, Jacopo},
|
| 48 |
+
journal = {Scientific Reports},
|
| 49 |
+
volume = {12},
|
| 50 |
+
number = {1},
|
| 51 |
+
pages = {18958},
|
| 52 |
+
year = {2022},
|
| 53 |
+
doi = {10.1038/s41598-022-23052-9}
|
| 54 |
+
}
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
License: MIT (inherited from `patrickjohncyh/fashion-clip`).
|
fashion-clip-vit-b-p32/clip_image_processor.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e81ca63e25a954c12508a99f6a18428acabce41c61dd0da66389ccb0297f51cb
|
| 3 |
+
size 4025
|
fashion-clip-vit-b-p32/clip_tokenizer.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:188b28e19884e7f72739908d072c7f3f685069780b3c8227fc6ee1204b38e702
|
| 3 |
+
size 1387327
|
fashion-clip-vit-b-p32/config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"vocab_file_name": "vocab.txt",
|
| 3 |
+
"vocab_md5": "f2b3f051f999e708058e5ad997d927a3",
|
| 4 |
+
"model_type": "clip",
|
| 5 |
+
"model_md5": "f627bc60603567158dd6b606df801c14",
|
| 6 |
+
"image_processor_md5": "c0083116bec05061fdc372c5bd1e0d19",
|
| 7 |
+
"image_processor_file_name": "clip_image_processor.onnx",
|
| 8 |
+
"image_embedder": true
|
| 9 |
+
}
|
fashion-clip-vit-b-p32/model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:374074040582148fd7a0ab2a550d1ec834bfb6f6a258d7aaa04d80383acccca2
|
| 3 |
+
size 605760566
|
fashion-clip-vit-b-p32/vocab.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|