Slep
/

CondViT-B16-cat

Feature Extraction

lrvsf-benchmark

Eval Results (legacy)

Model card Files Files and versions

Simon Lepage commited on Apr 8, 2024

Commit

3190ebb

·

1 Parent(s): 29f03f1

Add short README.md

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+# Conditional ViT - B/16 - Categories
+- Introduced in [Weakly-Supervised Conditional Embedding for Referred Visual Search](https://arxiv.org/abs/2306.02928)
+- [Training Data](https://huggingface.co/datasets/Slep/LAION-RVS-Fashion)
+- [Training Code](https://github.com/Simon-Lepage/CondViT-LRVSF)
+- [Demo](https://huggingface.co/spaces/Slep/CondViT-LRVSF-Demo)
+## General Infos
+Model finetuned from CLIP ViT-B/16 on LRVSF at 224x224. The conditioning categories are the following :
+- Bags
+- Feet
+- Hands
+- Head
+- Lower Body
+- Neck
+- Outwear
+- Upper Body
+- Waist
+- Whole Body
+Research use only.
+## How to Use
+```python
+from PIL import Image
+import requests
+from transformers import AutoProcessor, AutoModel
+import torch
+model = AutoModel.from_pretrained("Slep/CondViT-B16-cat")
+processor = AutoProcessor.from_pretrained("Slep/CondViT-B16-cat")
+url = "https://huggingface.co/datasets/Slep/LAION-RVS-Fashion/resolve/main/assets/108856.0.jpg"
+img = Image.open(requests.get(url, stream=True).raw)
+cat = "Bags"
+inputs = processor(images=[img], categories=[cat])
+raw_embedding = model(**inputs)
+normalized_embedding = torch.nn.functional.normalize(raw_embedding, dim=-1)
+```