MartinB77 commited on
Commit
d217128
·
verified ·
1 Parent(s): 7866e9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -64
README.md CHANGED
@@ -1,64 +1,83 @@
1
- # 🏍️ Amazon Product Classifier (Balanced B2)
2
-
3
- This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories.
4
- The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset.
5
-
6
- ## 🧠 Model Architecture
7
-
8
- - Base: `distilbert-base-uncased` (6-layer, 768 hidden size)
9
- - Classification Head: 2 dense layers with dropout + ReLU
10
- - Output: softmax over 19 product categories
11
-
12
- ## 📊 Training Data
13
-
14
- The model was trained on a balanced subset (≈40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories.
15
-
16
- Preprocessing included:
17
- - Removing empty/missing titles
18
- - Keeping top-level categories only
19
- - Balancing the dataset to avoid category bias
20
-
21
- ## 🍿 Example Categories
22
-
23
- - beauty & health
24
- - home & kitchen
25
- - tv, audio & cameras
26
- - computers & accessories
27
- - clothing & accessories
28
- - appliances
29
- - sports & fitness
30
- - grocery & gourmet foods
31
- - ... (total 19)
32
-
33
- ## 🧪 Example Usage (Python)
34
-
35
- ```python
36
- from transformers import pipeline
37
-
38
- classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")
39
-
40
- result = classifier("Smartwatch with heart rate monitor and GPS tracking")
41
- print(result)
42
- # [{'label': 'stores', 'score': 0.94}]
43
- ```
44
-
45
- ## 🚀 Intended Use
46
-
47
- The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:
48
-
49
- - Auto-tagging items in online stores
50
- - Cleaning and organizing product catalogs
51
- - Building recommendation engines (in combination with embeddings)
52
-
53
- ## 📌 Limitations
54
-
55
- - English-only (trained on `distilbert-base-uncased`)
56
- - May not perform well on very short or ambiguous product names
57
- - Not suitable for legal/medical/financial applications
58
-
59
- ## 📄 License & Source
60
-
61
- - Model: MIT License
62
- - Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle
63
- (check license and attribution requirements on Kaggle page)
64
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - product-classification
6
+ - transformers
7
+ - pytorch
8
+ - distilbert
9
+ datasets:
10
+ - lokeshparab/amazon-products-dataset
11
+ model-index:
12
+ - name: Product Classifier B2
13
+ results: []
14
+ ---
15
+
16
+ # Product Classifier B2
17
+
18
+ Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu...
19
+
20
+ # 🏍️ Amazon Product Classifier (Balanced B2)
21
+
22
+ This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories.
23
+ The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset.
24
+
25
+ ## 🧠 Model Architecture
26
+
27
+ - Base: `distilbert-base-uncased` (6-layer, 768 hidden size)
28
+ - Classification Head: 2 dense layers with dropout + ReLU
29
+ - Output: softmax over 19 product categories
30
+
31
+ ## 📊 Training Data
32
+
33
+ The model was trained on a balanced subset (≈40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories.
34
+
35
+ Preprocessing included:
36
+ - Removing empty/missing titles
37
+ - Keeping top-level categories only
38
+ - Balancing the dataset to avoid category bias
39
+
40
+ ## 🍿 Example Categories
41
+
42
+ - beauty & health
43
+ - home & kitchen
44
+ - tv, audio & cameras
45
+ - computers & accessories
46
+ - clothing & accessories
47
+ - appliances
48
+ - sports & fitness
49
+ - grocery & gourmet foods
50
+ - ... (total 19)
51
+
52
+ ## 🧪 Example Usage (Python)
53
+
54
+ ```python
55
+ from transformers import pipeline
56
+
57
+ classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")
58
+
59
+ result = classifier("Smartwatch with heart rate monitor and GPS tracking")
60
+ print(result)
61
+ # [{'label': 'stores', 'score': 0.94}]
62
+ ```
63
+
64
+ ## 🚀 Intended Use
65
+
66
+ The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:
67
+
68
+ - Auto-tagging items in online stores
69
+ - Cleaning and organizing product catalogs
70
+ - Building recommendation engines (in combination with embeddings)
71
+
72
+ ## 📌 Limitations
73
+
74
+ - English-only (trained on `distilbert-base-uncased`)
75
+ - May not perform well on very short or ambiguous product names
76
+ - Not suitable for legal/medical/financial applications
77
+
78
+ ## 📄 License & Source
79
+
80
+ - Model: MIT License
81
+ - Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle
82
+ (check license and attribution requirements on Kaggle page)
83
+