---
language: en
tags:
- fashion
- clip
- multimodal
- image-search
- text-search
- embeddings
- contrastive-learning
license: mit
datasets:
- custom
metrics:
- accuracy
- cosine-similarity
library_name: transformers
---

# GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings

This model is part of the GAP-CLIP project for fashion search with guaranteed attribute positioning.

## Model Description

GAP-CLIP is a multi-modal search model for fashion that combines:
- **Color embeddings** (16 dimensions): Specialized for color representation
- **Hierarchy embeddings** (64 dimensions): Specialized for category classification
- **General CLIP embeddings** (432 dimensions): General visual-semantic understanding

**Total embedding size**: 512 dimensions

## Quick Start

```python
from transformers import CLIPProcessor, CLIPModel
from huggingface_hub import hf_hub_download
import torch

# Load model
model = CLIPModel.from_pretrained("Leacb4/gap-clip")
processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")

# Process text
text = "red dress"
inputs = processor(text=[text], return_tensors="pt", padding=True)
text_features = model.get_text_features(**inputs)

# Extract subspaces
color_emb = text_features[:, :16]  # Color dimensions
hierarchy_emb = text_features[:, 16:80]  # Hierarchy dimensions
general_emb = text_features[:, 80:]  # General CLIP dimensions
```

## Citation

```bibtex
@misc{gap-clip-2024,
  title={GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings for Fashion Search},
  author={Sarfati, Lea Attia},
  year={2024},
  url={https://huggingface.co/Leacb4/gap-clip}
}
```

## License

MIT License - See LICENSE file for details.