--- language: en tags: - fashion - clip - multimodal - image-search - text-search - embeddings - contrastive-learning license: mit datasets: - custom metrics: - accuracy - cosine-similarity library_name: transformers --- # GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings This model is part of the GAP-CLIP project for fashion search with guaranteed attribute positioning. ## Model Description GAP-CLIP is a multi-modal search model for fashion that combines: - **Color embeddings** (16 dimensions): Specialized for color representation - **Hierarchy embeddings** (64 dimensions): Specialized for category classification - **General CLIP embeddings** (432 dimensions): General visual-semantic understanding **Total embedding size**: 512 dimensions ## Quick Start ```python from transformers import CLIPProcessor, CLIPModel from huggingface_hub import hf_hub_download import torch # Load model model = CLIPModel.from_pretrained("Leacb4/gap-clip") processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K") # Process text text = "red dress" inputs = processor(text=[text], return_tensors="pt", padding=True) text_features = model.get_text_features(**inputs) # Extract subspaces color_emb = text_features[:, :16] # Color dimensions hierarchy_emb = text_features[:, 16:80] # Hierarchy dimensions general_emb = text_features[:, 80:] # General CLIP dimensions ``` ## Citation ```bibtex @misc{gap-clip-2024, title={GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings for Fashion Search}, author={Sarfati, Lea Attia}, year={2024}, url={https://huggingface.co/Leacb4/gap-clip} } ``` ## License MIT License - See LICENSE file for details.