Add fashion-clip-vit-b-p32 (ONNX of patrickjohncyh/fashion-clip)

#16
fashion-clip-vit-b-p32/README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - clip
5
+ - fashion
6
+ - onnx
7
+ - typesense
8
+ base_model: patrickjohncyh/fashion-clip
9
+ ---
10
+
11
+ # fashion-clip-vit-b-p32
12
+
13
+ ONNX export of [patrickjohncyh/fashion-clip](https://huggingface.co/patrickjohncyh/fashion-clip) packaged for use with [Typesense](https://typesense.org)'s built-in image search.
14
+
15
+ FashionCLIP is a CLIP-ViT-B/32 model fine-tuned by Patrick John Chia et al. on Farfetch's fashion catalog. It produces materially better retrieval than the generic OpenAI CLIP on apparel and fashion product corpora.
16
+
17
+ ## Usage in Typesense
18
+
19
+ ```json
20
+ {
21
+ "name": "embedding",
22
+ "type": "float[]",
23
+ "embed": {
24
+ "from": ["image"],
25
+ "model_config": {"model_name": "ts/fashion-clip-vit-b-p32"}
26
+ }
27
+ }
28
+ ```
29
+
30
+ The architecture is identical to `clip-vit-b-p32`; the `clip_tokenizer.onnx`, `clip_image_processor.onnx`, and `vocab.txt` files are reused byte-for-byte. Only the model weights differ.
31
+
32
+ ## Files
33
+
34
+ | File | Notes |
35
+ |---|---|
36
+ | `model.onnx` | FashionCLIP weights exported via Optimum (`zero-shot-image-classification` task, opset 11). Inputs: `input_ids`, `pixel_values`, `attention_mask`. Outputs include `text_embeds` and `image_embeds`, both `[B, 512]`. |
37
+ | `clip_tokenizer.onnx` | Reused from `clip-vit-b-p32`. |
38
+ | `clip_image_processor.onnx` | Reused from `clip-vit-b-p32`. Uses `onnxruntime-extensions` `DecodeImage` custom op. |
39
+ | `vocab.txt` | CLIP BPE merges. Reused from `clip-vit-b-p32`. |
40
+ | `config.json` | Typesense model metadata with MD5 checksums. |
41
+
42
+ ## Citation
43
+
44
+ ```bibtex
45
+ @article{Chia2022FashionCLIP,
46
+ title = {Contrastive language and vision learning of general fashion concepts},
47
+ author = {Chia, Patrick John and Attanasio, Giuseppe and Bianchi, Federico and Terragni, Silvia and Magalh\~aes, Ana Rita and Goncalves, Diogo and Greco, Ciro and Tagliabue, Jacopo},
48
+ journal = {Scientific Reports},
49
+ volume = {12},
50
+ number = {1},
51
+ pages = {18958},
52
+ year = {2022},
53
+ doi = {10.1038/s41598-022-23052-9}
54
+ }
55
+ ```
56
+
57
+ License: MIT (inherited from `patrickjohncyh/fashion-clip`).
fashion-clip-vit-b-p32/clip_image_processor.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e81ca63e25a954c12508a99f6a18428acabce41c61dd0da66389ccb0297f51cb
3
+ size 4025
fashion-clip-vit-b-p32/clip_tokenizer.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:188b28e19884e7f72739908d072c7f3f685069780b3c8227fc6ee1204b38e702
3
+ size 1387327
fashion-clip-vit-b-p32/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "vocab_file_name": "vocab.txt",
3
+ "vocab_md5": "f2b3f051f999e708058e5ad997d927a3",
4
+ "model_type": "clip",
5
+ "model_md5": "f627bc60603567158dd6b606df801c14",
6
+ "image_processor_md5": "c0083116bec05061fdc372c5bd1e0d19",
7
+ "image_processor_file_name": "clip_image_processor.onnx",
8
+ "image_embedder": true
9
+ }
fashion-clip-vit-b-p32/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:374074040582148fd7a0ab2a550d1ec834bfb6f6a258d7aaa04d80383acccca2
3
+ size 605760566
fashion-clip-vit-b-p32/vocab.txt ADDED
The diff for this file is too large to render. See raw diff