surazbhandari
/

miniembed-product

product-matching

Model card Files Files and versions

miniembed-product / README.md

surazbhandari's picture

Update README.md

3017a9d verified 5 days ago

|

history blame contribute delete

2.52 kB

	---
	title: MiniEmbed Product Matcher
	emoji: ""
	colorFrom: blue
	colorTo: indigo
	pinned: false
	license: mit
	library_name: generic
	tags:
	- embeddings
	- product-matching
	---

	# MiniEmbed: Product Matching Model

	This model uses the same MiniEmbed architecture, trained from scratch exclusively for high-accuracy product matching (entity resolution).

	It is designed to determine if two product listings—often with different titles, specifications, or formatting—refer to the exact same physical item.

	## Use Case
	E-commerce Product Matching & Entity Resolution

	This model is trained to solve the "Same Product, Different Description" problem in e-commerce:

	* Marketplace Aggregation: Unifying listings from Amazon, Walmart, and eBay into a single catalog.
	* Competitor Analysis: Matching your inventory against competitors to track pricing.
	* Data Cleaning: Removing duplicates in databases where titles vary slightly (e.g., "Nike Air Max" vs "Nike Men's Air Max Shoe").

	Example Challenges Handled:
	* Variations: "iPhone 14 128GB" vs "Apple iPhone 14 Midnight 128GB"
	* Missing Attributes: "Sony Headphones" vs "Sony WH-1000XM5 Noise Canceling Headphones"
	* Formatting Differences: "5-Pack T-Shirts" vs "T-Shirt (Pack of 5)"

	## Interactive Demo

	This repository includes a Streamlit app to demonstrate the matching capability.

	To run locally:

	```bash
	pip install -r requirements.txt
	streamlit run demo.py
	```

	## Model Architecture

	* Type: Transformer Bi-Encoder (BERT-style)
	* Parameters: ~10.8M (Mini)
	* Dimensions: 256
	* Max Sequence Length: 128 tokens
	* Format: `SafeTensors` (Hugging Face ready)

	## Usage

	Since this is a custom model, you need to download the code and weights from the Hub:

	```python
	from huggingface_hub import snapshot_download
	import sys

	# 1. Download model (one-time)
	model_dir = snapshot_download("surazbhandari/miniembed-product")

	# 2. Add to path so we can import 'src'
	sys.path.insert(0, model_dir)

	# 3. Load Model
	from src.inference import EmbeddingInference
	model = EmbeddingInference.from_pretrained(model_dir)

	# Define two product titles
	product_a = "Sony WH-1000XM5 Wireless Noise Canceling Headphones, Black"
	product_b = "Sony WH1000XM5/B Headphones"

	# Calculate similarity (0 to 1)
	score = model.similarity(product_a, product_b)

	print(f"Similarity: {score:.4f}")
	```

	## Automated Sync

	This repository is automatically synced to Hugging Face Spaces via GitHub Actions.


	MIT