MartinB77's picture
Update README.md
d217128 verified
---
language: en
license: apache-2.0
tags:
- product-classification
- transformers
- pytorch
- distilbert
datasets:
- lokeshparab/amazon-products-dataset
model-index:
- name: Product Classifier B2
results: []
---
# Product Classifier B2
Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu...
# 🏍️ Amazon Product Classifier (Balanced B2)
This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories.
The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset.
## 🧠 Model Architecture
- Base: `distilbert-base-uncased` (6-layer, 768 hidden size)
- Classification Head: 2 dense layers with dropout + ReLU
- Output: softmax over 19 product categories
## 📊 Training Data
The model was trained on a balanced subset (≈40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories.
Preprocessing included:
- Removing empty/missing titles
- Keeping top-level categories only
- Balancing the dataset to avoid category bias
## 🍿 Example Categories
- beauty & health
- home & kitchen
- tv, audio & cameras
- computers & accessories
- clothing & accessories
- appliances
- sports & fitness
- grocery & gourmet foods
- ... (total 19)
## 🧪 Example Usage (Python)
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")
result = classifier("Smartwatch with heart rate monitor and GPS tracking")
print(result)
# [{'label': 'stores', 'score': 0.94}]
```
## 🚀 Intended Use
The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:
- Auto-tagging items in online stores
- Cleaning and organizing product catalogs
- Building recommendation engines (in combination with embeddings)
## 📌 Limitations
- English-only (trained on `distilbert-base-uncased`)
- May not perform well on very short or ambiguous product names
- Not suitable for legal/medical/financial applications
## 📄 License & Source
- Model: MIT License
- Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle
(check license and attribution requirements on Kaggle page)