metadata
language: en
license: apache-2.0
tags:
- product-classification
- transformers
- pytorch
- distilbert
datasets:
- lokeshparab/amazon-products-dataset
model-index:
- name: Product Classifier B2
results: []
Product Classifier B2
Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu...
🏍️ Amazon Product Classifier (Balanced B2)
This is a fine-tuned DistilBERT model for multi-class classification of product titles into Amazon-like product categories.
The model is based on distilbert-base-uncased and was trained on a balanced subset of the Amazon Products dataset.
🧠 Model Architecture
- Base:
distilbert-base-uncased(6-layer, 768 hidden size) - Classification Head: 2 dense layers with dropout + ReLU
- Output: softmax over 19 product categories
📊 Training Data
The model was trained on a balanced subset (≈40k samples) of the Amazon Products Dataset, which contains product titles and their corresponding categories.
Preprocessing included:
- Removing empty/missing titles
- Keeping top-level categories only
- Balancing the dataset to avoid category bias
🍿 Example Categories
- beauty & health
- home & kitchen
- tv, audio & cameras
- computers & accessories
- clothing & accessories
- appliances
- sports & fitness
- grocery & gourmet foods
- ... (total 19)
🧪 Example Usage (Python)
from transformers import pipeline
classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")
result = classifier("Smartwatch with heart rate monitor and GPS tracking")
print(result)
# [{'label': 'stores', 'score': 0.94}]
🚀 Intended Use
The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:
- Auto-tagging items in online stores
- Cleaning and organizing product catalogs
- Building recommendation engines (in combination with embeddings)
📌 Limitations
- English-only (trained on
distilbert-base-uncased) - May not perform well on very short or ambiguous product names
- Not suitable for legal/medical/financial applications
📄 License & Source
- Model: MIT License
- Training Data: Amazon Products Dataset on Kaggle
(check license and attribution requirements on Kaggle page)