Instructions to use bareethul/outfit-vibe-clip with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bareethul/outfit-vibe-clip with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="bareethul/outfit-vibe-clip") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("bareethul/outfit-vibe-clip") model = AutoModelForZeroShotImageClassification.from_pretrained("bareethul/outfit-vibe-clip") - Notebooks
- Google Colab
- Kaggle
Model Card - CLIP Zero-Shot
Overview
The CLIP (ViT-B/32) model is used off-the-shelf for zero-shot vibe matching.
It maps user-entered movie-review text and outfit images into a shared embedding space and ranks outfits by cosine similarity (vibe alignment).
Model Details
| Field | Description |
|---|---|
| Developed by | Bareethul Kader & Nada Khan |
| Framework | Hugging Face Transformers |
| Base Model | openai/clip-vit-base-patch32 |
| Repository | bareethulk/Forma |
| License | MIT (OpenAI CLIP) |
Intended Use
Direct Use
- Zero-shot text–image matching for outfit recommendations.
- Core engine of the Gradio demo app.
Out-of-Scope Use
- Not fine-tuned for specific fashion styles.
- May inherit biases from large-scale web data.
Dataset
Evaluation on nadakandrew/closet_multimodal_v1
Paired image–text inputs for vibe ranking.
Evaluation Setup
- Mode: Zero-shot classification + ranking
- Metric Space: Cosine similarity (512-D)
- Results:
- Accuracy: 91 %
- Precision@5: 1.00
- NDCG@5: 0.96
- MRR: 0.95
Interpretation: CLIP outperforms the trained ResNet18 (48 %) by a large margin, highlighting the power of pre-trained vision–language models for vibe alignment.
Limitations / Ethical Notes
- May reproduce biases from web data.
- Does not capture deep emotional context behind reviews.
- Research / educational use only.
- Downloads last month
- 3