--- title: MiniEmbed Product Matcher emoji: "" colorFrom: blue colorTo: indigo pinned: false license: mit library_name: generic tags: - embeddings - product-matching --- # MiniEmbed: Product Matching Model This model uses the same **MiniEmbed** architecture, trained **from scratch** exclusively for **high-accuracy product matching** (entity resolution). It is designed to determine if two product listings—often with different titles, specifications, or formatting—refer to the **exact same physical item**. ## Use Case **E-commerce Product Matching & Entity Resolution** This model is trained to solve the "Same Product, Different Description" problem in e-commerce: * **Marketplace Aggregation**: Unifying listings from Amazon, Walmart, and eBay into a single catalog. * **Competitor Analysis**: Matching your inventory against competitors to track pricing. * **Data Cleaning**: Removing duplicates in databases where titles vary slightly (e.g., "Nike Air Max" vs "Nike Men's Air Max Shoe"). **Example Challenges Handled:** * **Variations**: "iPhone 14 128GB" vs "Apple iPhone 14 Midnight 128GB" * **Missing Attributes**: "Sony Headphones" vs "Sony WH-1000XM5 Noise Canceling Headphones" * **Formatting Differences**: "5-Pack T-Shirts" vs "T-Shirt (Pack of 5)" ## Interactive Demo This repository includes a **Streamlit** app to demonstrate the matching capability. To run locally: ```bash pip install -r requirements.txt streamlit run demo.py ``` ## Model Architecture * **Type**: Transformer Bi-Encoder (BERT-style) * **Parameters**: ~10.8M (Mini) * **Dimensions**: 256 * **Max Sequence Length**: 128 tokens * **Format**: `SafeTensors` (Hugging Face ready) ## Usage Since this is a custom model, you need to download the code and weights from the Hub: ```python from huggingface_hub import snapshot_download import sys # 1. Download model (one-time) model_dir = snapshot_download("surazbhandari/miniembed-product") # 2. Add to path so we can import 'src' sys.path.insert(0, model_dir) # 3. Load Model from src.inference import EmbeddingInference model = EmbeddingInference.from_pretrained(model_dir) # Define two product titles product_a = "Sony WH-1000XM5 Wireless Noise Canceling Headphones, Black" product_b = "Sony WH1000XM5/B Headphones" # Calculate similarity (0 to 1) score = model.similarity(product_a, product_b) print(f"Similarity: {score:.4f}") ``` ## Automated Sync This repository is automatically synced to Hugging Face Spaces via GitHub Actions. MIT