|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- product-classification |
|
|
- transformers |
|
|
- pytorch |
|
|
- distilbert |
|
|
datasets: |
|
|
- lokeshparab/amazon-products-dataset |
|
|
model-index: |
|
|
- name: Product Classifier B2 |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Product Classifier B2 |
|
|
|
|
|
Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu... |
|
|
|
|
|
# 🏍️ Amazon Product Classifier (Balanced B2) |
|
|
|
|
|
This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories. |
|
|
The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset. |
|
|
|
|
|
## 🧠 Model Architecture |
|
|
|
|
|
- Base: `distilbert-base-uncased` (6-layer, 768 hidden size) |
|
|
- Classification Head: 2 dense layers with dropout + ReLU |
|
|
- Output: softmax over 19 product categories |
|
|
|
|
|
## 📊 Training Data |
|
|
|
|
|
The model was trained on a balanced subset (≈40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories. |
|
|
|
|
|
Preprocessing included: |
|
|
- Removing empty/missing titles |
|
|
- Keeping top-level categories only |
|
|
- Balancing the dataset to avoid category bias |
|
|
|
|
|
## 🍿 Example Categories |
|
|
|
|
|
- beauty & health |
|
|
- home & kitchen |
|
|
- tv, audio & cameras |
|
|
- computers & accessories |
|
|
- clothing & accessories |
|
|
- appliances |
|
|
- sports & fitness |
|
|
- grocery & gourmet foods |
|
|
- ... (total 19) |
|
|
|
|
|
## 🧪 Example Usage (Python) |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2") |
|
|
|
|
|
result = classifier("Smartwatch with heart rate monitor and GPS tracking") |
|
|
print(result) |
|
|
# [{'label': 'stores', 'score': 0.94}] |
|
|
``` |
|
|
|
|
|
## 🚀 Intended Use |
|
|
|
|
|
The model is designed to help developers quickly classify product titles into e-commerce categories, useful for: |
|
|
|
|
|
- Auto-tagging items in online stores |
|
|
- Cleaning and organizing product catalogs |
|
|
- Building recommendation engines (in combination with embeddings) |
|
|
|
|
|
## 📌 Limitations |
|
|
|
|
|
- English-only (trained on `distilbert-base-uncased`) |
|
|
- May not perform well on very short or ambiguous product names |
|
|
- Not suitable for legal/medical/financial applications |
|
|
|
|
|
## 📄 License & Source |
|
|
|
|
|
- Model: MIT License |
|
|
- Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle |
|
|
(check license and attribution requirements on Kaggle page) |
|
|
|
|
|
|