File size: 2,492 Bytes
d217128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
language: en
license: apache-2.0
tags:
  - product-classification
  - transformers
  - pytorch
  - distilbert
datasets:
  - lokeshparab/amazon-products-dataset
model-index:
  - name: Product Classifier B2
    results: []
---

# Product Classifier B2

Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu...

# 🏍️ Amazon Product Classifier (Balanced B2)

This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories.  
The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset.

## 🧠 Model Architecture

- Base: `distilbert-base-uncased` (6-layer, 768 hidden size)
- Classification Head: 2 dense layers with dropout + ReLU
- Output: softmax over 19 product categories

## 📊 Training Data

The model was trained on a balanced subset (≈40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories.

Preprocessing included:
- Removing empty/missing titles
- Keeping top-level categories only
- Balancing the dataset to avoid category bias

## 🍿 Example Categories

- beauty & health  
- home & kitchen  
- tv, audio & cameras  
- computers & accessories  
- clothing & accessories  
- appliances  
- sports & fitness  
- grocery & gourmet foods  
- ... (total 19)

## 🧪 Example Usage (Python)

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")

result = classifier("Smartwatch with heart rate monitor and GPS tracking")
print(result)
# [{'label': 'stores', 'score': 0.94}]
```

## 🚀 Intended Use

The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:

- Auto-tagging items in online stores
- Cleaning and organizing product catalogs
- Building recommendation engines (in combination with embeddings)

## 📌 Limitations

- English-only (trained on `distilbert-base-uncased`)
- May not perform well on very short or ambiguous product names
- Not suitable for legal/medical/financial applications

## 📄 License & Source

- Model: MIT License  
- Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle  
  (check license and attribution requirements on Kaggle page)