MartinB77's picture
Update README.md
d217128 verified
metadata
language: en
license: apache-2.0
tags:
  - product-classification
  - transformers
  - pytorch
  - distilbert
datasets:
  - lokeshparab/amazon-products-dataset
model-index:
  - name: Product Classifier B2
    results: []

Product Classifier B2

Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu...

🏍️ Amazon Product Classifier (Balanced B2)

This is a fine-tuned DistilBERT model for multi-class classification of product titles into Amazon-like product categories.
The model is based on distilbert-base-uncased and was trained on a balanced subset of the Amazon Products dataset.

🧠 Model Architecture

  • Base: distilbert-base-uncased (6-layer, 768 hidden size)
  • Classification Head: 2 dense layers with dropout + ReLU
  • Output: softmax over 19 product categories

📊 Training Data

The model was trained on a balanced subset (≈40k samples) of the Amazon Products Dataset, which contains product titles and their corresponding categories.

Preprocessing included:

  • Removing empty/missing titles
  • Keeping top-level categories only
  • Balancing the dataset to avoid category bias

🍿 Example Categories

  • beauty & health
  • home & kitchen
  • tv, audio & cameras
  • computers & accessories
  • clothing & accessories
  • appliances
  • sports & fitness
  • grocery & gourmet foods
  • ... (total 19)

🧪 Example Usage (Python)

from transformers import pipeline

classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2")

result = classifier("Smartwatch with heart rate monitor and GPS tracking")
print(result)
# [{'label': 'stores', 'score': 0.94}]

🚀 Intended Use

The model is designed to help developers quickly classify product titles into e-commerce categories, useful for:

  • Auto-tagging items in online stores
  • Cleaning and organizing product catalogs
  • Building recommendation engines (in combination with embeddings)

📌 Limitations

  • English-only (trained on distilbert-base-uncased)
  • May not perform well on very short or ambiguous product names
  • Not suitable for legal/medical/financial applications

📄 License & Source

  • Model: MIT License
  • Training Data: Amazon Products Dataset on Kaggle
    (check license and attribution requirements on Kaggle page)