Spaces:

Barvero
/

visual-image-recommender

Sleeping

visual-image-recommender / README.md

Update README.md

c9d3f3f verified 2 months ago

3.32 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

emoji: 🛍️
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false

CLIP Image Recommender (Stanford Online Products) -

This project implements an image-based recommendation system using pretrained CLIP embeddings.
Users upload an image, which is converted into a vector representation using the same CLIP model applied to the dataset images.
The system then retrieves the Top-3 most visually similar items from the dataset based on similarity in a shared embedding space.

Application Demo-

The application is deployed as a Hugging Face Space using Gradio, providing a simple and interactive interface for image-based recommendations.

Dataset -

Source: JamieSJS/stanford-online-products (Hugging Face)
Modality: Images
Working subset: 3,000 randomly sampled images
The subset was selected to ensure computational efficiency while preserving visual diversity and reproducibility.

Method -

Image embeddings were precomputed for a random subset of 3,000 product images using a pretrained CLIP image encoder.
All embeddings were normalized to enable consistent similarity comparisons.
At inference time, a user-provided image is embedded using the same CLIP model.
The system compares the user image embedding to the dataset embeddings and retrieves the Top-3 most visually similar items.
Results are displayed through a user-friendly Gradio interface as an image gallery.

Example Recommendation Output -

The retrieved items share dominant visual attributes such as color, texture, and overall appearance, demonstrating the effectiveness of CLIP embeddings for visual similarity.

Hybrid Image & Text Search -

In addition to image-only search, the system supports hybrid queries combining both image and text inputs. CLIP embeds both modalities into a shared representation space, allowing visual and textual signals to be jointly considered during retrieval.

This behavior highlights CLIP’s strength in capturing appearance-based similarity rather than strict semantic categories.
Such flexibility is particularly valuable in discovery-oriented recommendation systems, where visual style and inspiration are more important than exact category matching.
When higher semantic precision is required, incorporating structured metadata (such as product category or attributes) could further refine the recommendations.

Files in the Repo -

How to Use?