|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- fashion |
|
|
- clip |
|
|
- multimodal |
|
|
- image-search |
|
|
- text-search |
|
|
- embeddings |
|
|
- contrastive-learning |
|
|
license: mit |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- cosine-similarity |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings |
|
|
|
|
|
This model is part of the GAP-CLIP project for fashion search with guaranteed attribute positioning. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
GAP-CLIP is a multi-modal search model for fashion that combines: |
|
|
- **Color embeddings** (16 dimensions): Specialized for color representation |
|
|
- **Hierarchy embeddings** (64 dimensions): Specialized for category classification |
|
|
- **General CLIP embeddings** (432 dimensions): General visual-semantic understanding |
|
|
|
|
|
**Total embedding size**: 512 dimensions |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import CLIPProcessor, CLIPModel |
|
|
from huggingface_hub import hf_hub_download |
|
|
import torch |
|
|
|
|
|
# Load model |
|
|
model = CLIPModel.from_pretrained("Leacb4/gap-clip") |
|
|
processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K") |
|
|
|
|
|
# Process text |
|
|
text = "red dress" |
|
|
inputs = processor(text=[text], return_tensors="pt", padding=True) |
|
|
text_features = model.get_text_features(**inputs) |
|
|
|
|
|
# Extract subspaces |
|
|
color_emb = text_features[:, :16] # Color dimensions |
|
|
hierarchy_emb = text_features[:, 16:80] # Hierarchy dimensions |
|
|
general_emb = text_features[:, 80:] # General CLIP dimensions |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{gap-clip-2024, |
|
|
title={GAP-CLIP: Guaranteed Attribute Positioning in CLIP Embeddings for Fashion Search}, |
|
|
author={Sarfati, Lea Attia}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/Leacb4/gap-clip} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - See LICENSE file for details. |
|
|
|