MartinB77 commited on
Commit
7866e9f
ยท
verified ยท
1 Parent(s): 8f345ea

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ๐Ÿ๏ธ Amazon Product Classifier (Balanced B2)
2
+
3
+ This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories.
4
+ The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset.
5
+
6
+ ## ๐Ÿง  Model Architecture
7
+
8
+ - Base: `distilbert-base-uncased` (6-layer, 768 hidden size)
9
+ - Classification Head: 2 dense layers with dropout + ReLU
10
+ - Output: softmax over 19 product categories
11
+
12
+ ## ๐Ÿ“Š Training Data
13
+
14
+ The model was trained on a balanced subset (โ‰ˆ40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories.
15
+
16
+ Preprocessing included:
17
+ - Removing empty/missing titles
18
+ - Keeping top-level categories only
19
+ - Balancing the dataset to avoid category bias
20
+
21
+ ## ๐Ÿฟ Example Categories
22
+
23
+ - beauty & health
24
+ - home & kitchen
25
+ - tv, audio & cameras
26
+ - computers & accessories
27
+ - clothing & accessories
28
+ - appliances
29
+ - sports & fitness
30
+ - grocery & gourmet foods
31
+ - ... (total 19)
32
+
33
+ ## ๐Ÿงช Example Usage (Python)
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+
38
+ classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")
39
+
40
+ result = classifier("Smartwatch with heart rate monitor and GPS tracking")
41
+ print(result)
42
+ # [{'label': 'stores', 'score': 0.94}]
43
+ ```
44
+
45
+ ## ๐Ÿš€ Intended Use
46
+
47
+ The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:
48
+
49
+ - Auto-tagging items in online stores
50
+ - Cleaning and organizing product catalogs
51
+ - Building recommendation engines (in combination with embeddings)
52
+
53
+ ## ๐Ÿ“Œ Limitations
54
+
55
+ - English-only (trained on `distilbert-base-uncased`)
56
+ - May not perform well on very short or ambiguous product names
57
+ - Not suitable for legal/medical/financial applications
58
+
59
+ ## ๐Ÿ“„ License & Source
60
+
61
+ - Model: MIT License
62
+ - Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle
63
+ (check license and attribution requirements on Kaggle page)
64
+