Barvero's picture
Update README.md
c9d3f3f verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
emoji: 🛍️
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 6.1.0
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

https://drive.google.com/file/d/1_o4C3NW0L0udTcxgU3sNuyhXDIcE2-8u/view?usp=drivesdk

CLIP Image Recommender (Stanford Online Products) -

  • This project implements an image-based recommendation system using pretrained CLIP embeddings.
  • Users upload an image, which is converted into a vector representation using the same CLIP model applied to the dataset images.
  • The system then retrieves the Top-3 most visually similar items from the dataset based on similarity in a shared embedding space.

Application Demo-

image

The application is deployed as a Hugging Face Space using Gradio, providing a simple and interactive interface for image-based recommendations.

Dataset -

  • Source: JamieSJS/stanford-online-products (Hugging Face)
  • Modality: Images
  • Working subset: 3,000 randomly sampled images
  • The subset was selected to ensure computational efficiency while preserving visual diversity and reproducibility.

Method -

  • Image embeddings were precomputed for a random subset of 3,000 product images using a pretrained CLIP image encoder.
  • All embeddings were normalized to enable consistent similarity comparisons.
  • At inference time, a user-provided image is embedded using the same CLIP model.
  • The system compares the user image embedding to the dataset embeddings and retrieves the Top-3 most visually similar items.
  • Results are displayed through a user-friendly Gradio interface as an image gallery.

Example Recommendation Output -

image

  • The retrieved items share dominant visual attributes such as color, texture, and overall appearance, demonstrating the effectiveness of CLIP embeddings for visual similarity.

Hybrid Image & Text Search -

  • In addition to image-only search, the system supports hybrid queries combining both image and text inputs. CLIP embeds both modalities into a shared representation space, allowing visual and textual signals to be jointly considered during retrieval.

image

  • This behavior highlights CLIP’s strength in capturing appearance-based similarity rather than strict semantic categories.
  • Such flexibility is particularly valuable in discovery-oriented recommendation systems, where visual style and inspiration are more important than exact category matching.
  • When higher semantic precision is required, incorporating structured metadata (such as product category or attributes) could further refine the recommendations.

Files in the Repo -

  • app.py – Gradio application code
  • clip_embeddings_3000.parquet – Precomputed normalized image embeddings
  • sampled_indices_3000.npy – Indices of the sampled subset (for reproducibility)

How to Use?

  • Upload an image (and optionally provide a short text description).
  • The system returns the three most visually similar products from the dataset.