MiniEmbed: Product Matching Model

This model uses the same MiniEmbed architecture, trained from scratch exclusively for high-accuracy product matching (entity resolution).

It is designed to determine if two product listings—often with different titles, specifications, or formatting—refer to the exact same physical item.

Use Case

E-commerce Product Matching & Entity Resolution

This model is trained to solve the "Same Product, Different Description" problem in e-commerce:

  • Marketplace Aggregation: Unifying listings from Amazon, Walmart, and eBay into a single catalog.
  • Competitor Analysis: Matching your inventory against competitors to track pricing.
  • Data Cleaning: Removing duplicates in databases where titles vary slightly (e.g., "Nike Air Max" vs "Nike Men's Air Max Shoe").

Example Challenges Handled:

  • Variations: "iPhone 14 128GB" vs "Apple iPhone 14 Midnight 128GB"
  • Missing Attributes: "Sony Headphones" vs "Sony WH-1000XM5 Noise Canceling Headphones"
  • Formatting Differences: "5-Pack T-Shirts" vs "T-Shirt (Pack of 5)"

Interactive Demo

This repository includes a Streamlit app to demonstrate the matching capability.

To run locally:

pip install -r requirements.txt
streamlit run demo.py

Model Architecture

  • Type: Transformer Bi-Encoder (BERT-style)
  • Parameters: ~10.8M (Mini)
  • Dimensions: 256
  • Max Sequence Length: 128 tokens
  • Format: SafeTensors (Hugging Face ready)

Usage

Since this is a custom model, you need to download the code and weights from the Hub:

from huggingface_hub import snapshot_download
import sys

# 1. Download model (one-time)
model_dir = snapshot_download("surazbhandari/miniembed-product")

# 2. Add to path so we can import 'src'
sys.path.insert(0, model_dir)

# 3. Load Model
from src.inference import EmbeddingInference
model = EmbeddingInference.from_pretrained(model_dir)

# Define two product titles
product_a = "Sony WH-1000XM5 Wireless Noise Canceling Headphones, Black"
product_b = "Sony WH1000XM5/B Headphones"

# Calculate similarity (0 to 1)
score = model.similarity(product_a, product_b)

print(f"Similarity: {score:.4f}")

Automated Sync

This repository is automatically synced to Hugging Face Spaces via GitHub Actions.

MIT

Downloads last month
52
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support