Spaces:

Barvero
/

visual-image-recommender

Sleeping

App Files Files Community

visual-image-recommender / README.md

Barvero

Update README.md

c9d3f3f verified 2 months ago

preview code

raw

history blame contribute delete

3.32 kB

	---
	emoji: 🛍️
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: "6.1.0"
	app_file: app.py
	pinned: false
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	https://drive.google.com/file/d/1_o4C3NW0L0udTcxgU3sNuyhXDIcE2-8u/view?usp=drivesdk

	CLIP Image Recommender (Stanford Online Products) -

	- This project implements an image-based recommendation system using pretrained CLIP embeddings.
	- Users upload an image, which is converted into a vector representation using the same CLIP model applied to the dataset images.
	- The system then retrieves the Top-3 most visually similar items from the dataset based on similarity in a shared embedding space.

	Application Demo-

	![image](https://cdn-uploads.huggingface.co/production/uploads/690cf480f5e17706452c5d7c/PiXE9v-Rc5OIv5NT9saGX.png)

	The application is deployed as a Hugging Face Space using Gradio, providing a simple and interactive interface for image-based recommendations.

	Dataset -
	- Source: JamieSJS/stanford-online-products (Hugging Face)
	- Modality: Images
	- Working subset: 3,000 randomly sampled images
	- The subset was selected to ensure computational efficiency while preserving visual diversity and reproducibility.

	Method -
	- Image embeddings were precomputed for a random subset of 3,000 product images using a pretrained CLIP image encoder.
	- All embeddings were normalized to enable consistent similarity comparisons.
	- At inference time, a user-provided image is embedded using the same CLIP model.
	- The system compares the user image embedding to the dataset embeddings and retrieves the Top-3 most visually similar items.
	- Results are displayed through a user-friendly Gradio interface as an image gallery.

	Example Recommendation Output -

	![image](https://cdn-uploads.huggingface.co/production/uploads/690cf480f5e17706452c5d7c/eXQx-Y6Sp3JJeNFVvReUh.png)

	- The retrieved items share dominant visual attributes such as color, texture, and overall appearance, demonstrating the effectiveness of CLIP embeddings for visual similarity.

	Hybrid Image & Text Search -

	- In addition to image-only search, the system supports hybrid queries combining both image and text inputs.
	CLIP embeds both modalities into a shared representation space, allowing visual and textual signals to be jointly considered during retrieval.

	![image](https://cdn-uploads.huggingface.co/production/uploads/690cf480f5e17706452c5d7c/VXru_oqZNwsLBDzYM7kih.png)

	- This behavior highlights CLIP’s strength in capturing appearance-based similarity rather than strict semantic categories.
	- Such flexibility is particularly valuable in discovery-oriented recommendation systems, where visual style and inspiration are more important than exact category matching.
	- When higher semantic precision is required, incorporating structured metadata (such as product category or attributes) could further refine the recommendations.

	Files in the Repo -
	- app.py – Gradio application code
	- clip_embeddings_3000.parquet – Precomputed normalized image embeddings
	- sampled_indices_3000.npy – Indices of the sampled subset (for reproducibility)

	How to Use?
	- Upload an image (and optionally provide a short text description).
	- The system returns the three most visually similar products from the dataset.