Spaces:
Sleeping
Sleeping
| emoji: 🛍️ | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: "6.1.0" | |
| app_file: app.py | |
| pinned: false | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| https://drive.google.com/file/d/1_o4C3NW0L0udTcxgU3sNuyhXDIcE2-8u/view?usp=drivesdk | |
| CLIP Image Recommender (Stanford Online Products) - | |
| - This project implements an image-based recommendation system using pretrained CLIP embeddings. | |
| - Users upload an image, which is converted into a vector representation using the same CLIP model applied to the dataset images. | |
| - The system then retrieves the Top-3 most visually similar items from the dataset based on similarity in a shared embedding space. | |
| Application Demo- | |
|  | |
| The application is deployed as a Hugging Face Space using Gradio, providing a simple and interactive interface for image-based recommendations. | |
| Dataset - | |
| - Source: JamieSJS/stanford-online-products (Hugging Face) | |
| - Modality: Images | |
| - Working subset: 3,000 randomly sampled images | |
| - The subset was selected to ensure computational efficiency while preserving visual diversity and reproducibility. | |
| Method - | |
| - Image embeddings were precomputed for a random subset of 3,000 product images using a pretrained CLIP image encoder. | |
| - All embeddings were normalized to enable consistent similarity comparisons. | |
| - At inference time, a user-provided image is embedded using the same CLIP model. | |
| - The system compares the user image embedding to the dataset embeddings and retrieves the Top-3 most visually similar items. | |
| - Results are displayed through a user-friendly Gradio interface as an image gallery. | |
| Example Recommendation Output - | |
|  | |
| - The retrieved items share dominant visual attributes such as color, texture, and overall appearance, demonstrating the effectiveness of CLIP embeddings for visual similarity. | |
| Hybrid Image & Text Search - | |
| - In addition to image-only search, the system supports hybrid queries combining both image and text inputs. | |
| CLIP embeds both modalities into a shared representation space, allowing visual and textual signals to be jointly considered during retrieval. | |
|  | |
| - This behavior highlights CLIP’s strength in capturing appearance-based similarity rather than strict semantic categories. | |
| - Such flexibility is particularly valuable in discovery-oriented recommendation systems, where visual style and inspiration are more important than exact category matching. | |
| - When higher semantic precision is required, incorporating structured metadata (such as product category or attributes) could further refine the recommendations. | |
| Files in the Repo - | |
| - app.py – Gradio application code | |
| - clip_embeddings_3000.parquet – Precomputed normalized image embeddings | |
| - sampled_indices_3000.npy – Indices of the sampled subset (for reproducibility) | |
| How to Use? | |
| - Upload an image (and optionally provide a short text description). | |
| - The system returns the three most visually similar products from the dataset. |