File size: 2,523 Bytes
c792dcb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1238134
c792dcb
1238134
c792dcb
 
d79e1fc
c792dcb
3017a9d
d79e1fc
 
 
 
 
 
 
 
 
c792dcb
 
 
 
 
 
 
 
 
22028e6
c792dcb
 
 
 
 
 
 
 
 
 
 
 
0fbb74d
c792dcb
 
0fbb74d
 
 
 
 
c792dcb
0fbb74d
 
 
 
 
 
c792dcb
 
 
 
 
 
 
 
0fbb74d
c792dcb
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: MiniEmbed Product Matcher
emoji: ""
colorFrom: blue
colorTo: indigo
pinned: false
license: mit
library_name: generic
tags:
- embeddings
- product-matching
---

# MiniEmbed: Product Matching Model

This model uses the same **MiniEmbed** architecture, trained **from scratch** exclusively for **high-accuracy product matching** (entity resolution). 

It is designed to determine if two product listings—often with different titles, specifications, or formatting—refer to the **exact same physical item**.

## Use Case
**E-commerce Product Matching & Entity Resolution**

This model is trained to solve the "Same Product, Different Description" problem in e-commerce:

*   **Marketplace Aggregation**: Unifying listings from Amazon, Walmart, and eBay into a single catalog.
*   **Competitor Analysis**: Matching your inventory against competitors to track pricing.
*   **Data Cleaning**: Removing duplicates in databases where titles vary slightly (e.g., "Nike Air Max" vs "Nike Men's Air Max Shoe").

**Example Challenges Handled:**
*   **Variations**: "iPhone 14 128GB" vs "Apple iPhone 14 Midnight 128GB"
*   **Missing Attributes**: "Sony Headphones" vs "Sony WH-1000XM5 Noise Canceling Headphones"
*   **Formatting Differences**: "5-Pack T-Shirts" vs "T-Shirt (Pack of 5)"

## Interactive Demo

This repository includes a **Streamlit** app to demonstrate the matching capability.

To run locally:

```bash
pip install -r requirements.txt
streamlit run demo.py
```

## Model Architecture

*   **Type**: Transformer Bi-Encoder (BERT-style)
*   **Parameters**: ~10.8M (Mini)
*   **Dimensions**: 256
*   **Max Sequence Length**: 128 tokens
*   **Format**: `SafeTensors` (Hugging Face ready)

## Usage

Since this is a custom model, you need to download the code and weights from the Hub:

```python
from huggingface_hub import snapshot_download
import sys

# 1. Download model (one-time)
model_dir = snapshot_download("surazbhandari/miniembed-product")

# 2. Add to path so we can import 'src'
sys.path.insert(0, model_dir)

# 3. Load Model
from src.inference import EmbeddingInference
model = EmbeddingInference.from_pretrained(model_dir)

# Define two product titles
product_a = "Sony WH-1000XM5 Wireless Noise Canceling Headphones, Black"
product_b = "Sony WH1000XM5/B Headphones"

# Calculate similarity (0 to 1)
score = model.similarity(product_a, product_b)

print(f"Similarity: {score:.4f}")
```

## Automated Sync

This repository is automatically synced to Hugging Face Spaces via GitHub Actions.


MIT