gap-clip / README.md
Leacb4's picture
Upload README.md with huggingface_hub
fb58965 verified
metadata
language: en
tags:
  - fashion
  - clip
  - multimodal
  - image-search
  - text-search
  - embeddings
  - contrastive-learning
license: mit
datasets:
  - custom
metrics:
  - accuracy
  - cosine-similarity
library_name: transformers

GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings

This model is part of the GAP-CLIP project for fashion search with guaranteed attribute positioning.

Model Description

GAP-CLIP is a multi-modal search model for fashion that combines:

  • Color embeddings (16 dimensions): Specialized for color representation
  • Hierarchy embeddings (64 dimensions): Specialized for category classification
  • General CLIP embeddings (432 dimensions): General visual-semantic understanding

Total embedding size: 512 dimensions

Quick Start

from transformers import CLIPProcessor, CLIPModel
from huggingface_hub import hf_hub_download
import torch

# Load model
model = CLIPModel.from_pretrained("Leacb4/gap-clip")
processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")

# Process text
text = "red dress"
inputs = processor(text=[text], return_tensors="pt", padding=True)
text_features = model.get_text_features(**inputs)

# Extract subspaces
color_emb = text_features[:, :16]  # Color dimensions
hierarchy_emb = text_features[:, 16:80]  # Hierarchy dimensions
general_emb = text_features[:, 80:]  # General CLIP dimensions

Citation

@misc{gap-clip-2024,
  title={GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings for Fashion Search},
  author={Sarfati, Lea Attia},
  year={2024},
  url={https://huggingface.co/Leacb4/gap-clip}
}

License

MIT License - See LICENSE file for details.