๐๏ธ XLM-RoBERTa E-Commerce Product Classifier
Model Description
This model is a fine-tuned version of FacebookAI/xlm-roberta-base for multi-class product classification in e-commerce applications. It classifies product descriptions into 32 distinct categories with 90.1% accuracy.
The model was trained on ~32,000 synthetic English product descriptions covering major e-commerce categories, featuring realistic seller variations (professional retailers, individual sellers, resellers, and minimal listings).
Key Features
- โ 32 product categories covering major e-commerce segments
- โ 90.1% test accuracy with balanced performance across categories
- โ Robust to real-world variations: handles typos, abbreviations, casual language
- โ Fast inference: ~50-100 samples/second on CPU, 200+ on GPU
- โ Production-ready: trained with best practices, comprehensive evaluation
Intended Use
Primary Use Cases
- E-commerce platforms: Automatic product categorization for listings
- Marketplaces: Category suggestion for sellers
- Search & recommendation: Improve product discovery and filtering
- Content moderation: Detect miscategorized products
- Data quality: Clean and standardize product catalogs
Out-of-Scope Use
- โ Non-English product descriptions (model trained on English only)
- โ Fine-grained product attributes (color, size, brand) - use attribute extraction models
- โ Product images - use vision models instead
- โ Categories outside the 32 predefined classes
Performance
Test Set Results
| Metric | Score |
|---|---|
| Accuracy | 90.14% |
| F1 Score (Weighted) | 90.00% |
| F1 Score (Macro) | 88.55% |
| Precision (Weighted) | 90.34% |
| Recall (Weighted) | 90.14% |
Training Dynamics
Training Curves:
The model demonstrates excellent convergence with:
- Training loss: Smooth decrease from 3.5 to 0.3
- Validation loss: Stable at ~0.36 (no overfitting)
- F1 Score: Steady improvement from 0.75 to 0.90+ over 3 epochs
Per-Category Performance
Top Performing Categories (F1 > 0.95):
| Category | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| pet_supplies | 100.0% | 99.3% | 99.7% | 151 |
| bedding_bath | 98.7% | 98.7% | 98.7% | 151 |
| baby_maternity | 96.8% | 99.3% | 98.0% | 151 |
| home_decor_lighting | 96.8% | 99.3% | 98.0% | 151 |
| books_media | 97.4% | 98.0% | 97.7% | 150 |
| grocery_food | 98.0% | 96.7% | 97.3% | 150 |
Categories Needing Improvement (F1 < 0.80):
| Category | Precision | Recall | F1-Score | Issue |
|---|---|---|---|---|
| fashion_accessories | 71.4% | 69.5% | 70.5% | Overlaps with fashion_clothing |
| electronics | 79.0% | 75.7% | 77.3% | Confused with computers_networking |
| small_appliances | 85.5% | 70.2% | 77.1% | Confused with large_appliances, kitchen_dining |
| shoes_footwear | 67.5% | 89.7% | 77.1% | High recall, low precision |
Confusion Matrix Analysis
Key Observations:
- Strong diagonal: Most categories classified correctly (dark blue diagonal)
- Minimal confusion: Very few off-diagonal cells (light blue)
- Related categories show expected overlap:
electronicsโcomputers_networking(related domains)fashion_accessoriesโfashion_clothing(semantic overlap)small_appliancesโkitchen_dining(contextual similarity)
Confusion patterns make semantic sense - errors occur between genuinely similar categories.
Training Details
Training Data
- Dataset: Lezh1n/ecommerce-product-classification-by-categories
- Total samples: 31,851
- Train split: 22,283 samples (70%)
- Validation split: 4,760 samples (15%)
- Test split: 4,808 samples (15%)
- Distribution: Stratified sampling, ~1,000 samples per category
Training Procedure
Hyperparameters:
{
"model": "FacebookAI/xlm-roberta-base",
"num_labels": 32,
"max_length": 256,
"batch_size": 16,
"learning_rate": 2e-5,
"num_epochs": 3,
"warmup_ratio": 0.1,
"weight_decay": 0.01,
"optimizer": "AdamW",
"lr_scheduler": "linear with warmup"
}
Training Environment:
- Hardware: Google Colab T4 GPU (16GB VRAM)
- Training time: ~45 minutes
- Mixed precision: FP16 for faster training
- Total steps: ~4,200
- Evaluation frequency: Every 500 steps
Regularization:
- Weight decay: 0.01
- Dropout: 0.1 (XLM-RoBERTa default)
- Early stopping: Best model based on F1 score
How to Use
Quick Start
from transformers import pipeline
# Load classifier
classifier = pipeline(
"text-classification",
model="Lezh1n/xlm-roberta-ecommerce-classifier"
)
# Classify products
result = classifier("Sony WH-1000XM5 wireless headphones noise cancelling")
print(result)
# Output: [{'label': 'electronics', 'score': 0.9876}]
Batch Classification
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "Lezh1n/xlm-roberta-ecommerce-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare texts
texts = [
"iPhone 15 Pro 256GB unlocked",
"Men's running shoes Nike Air Max",
"Samsung 4K Smart TV 55 inch"
]
# Tokenize
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
# Predict
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_classes = torch.argmax(predictions, dim=-1)
# Decode
for text, pred_id in zip(texts, predicted_classes):
label = model.config.id2label[pred_id.item()]
print(f"{text[:50]}... โ {label}")
Top-K Predictions
# Get top 3 predictions with confidence scores
classifier = pipeline("text-classification", model="Lezh1n/xlm-roberta-ecommerce-classifier")
result = classifier(
"Sony wireless headphones",
top_k=3
)
for pred in result:
print(f"{pred['label']}: {pred['score']:.2%}")
# Output:
# electronics: 95.23%
# computers_networking: 3.12%
# mobile_phones_tablets: 1.15%
API Deployment Example
from fastapi import FastAPI
from transformers import pipeline
app = FastAPI()
classifier = pipeline("text-classification", model="Lezh1n/ecommerce-product-classification-by-categories")
@app.post("/classify")
async def classify_product(text: str, top_k: int = 1):
results = classifier(text, top_k=top_k)
return {"predictions": results}
# Run: uvicorn app:app --host 0.0.0.0 --port 8000
Categories
The model classifies products into the following 32 categories:
| Category | F1-Score | Category | F1-Score |
|---|---|---|---|
| arts_crafts | 94.98% | jewelry | 89.80% |
| automotive_motorcycle | 96.39% | kitchen_dining | 86.10% |
| baby_maternity | 98.04% | large_appliances | 87.26% |
| bags_luggage | 92.31% | mobile_phones_tablets | 95.42% |
| beauty_personal_care | 95.36% | musical_instruments | 95.65% |
| bedding_bath | 98.68% | pet_supplies | 99.67% |
| books_media | 97.67% | shoes_footwear | 77.06% |
| computers_networking | 92.01% | small_appliances | 77.09% |
| electronics | 77.30% | software_digital_goods | 93.73% |
| fashion_accessories | 70.47% | sports_outdoors | 71.04% |
| fashion_clothing | 80.40% | stationery_office_supplies | 94.16% |
| garden_outdoor_living | 96.71% | tools_hardware | 78.85% |
| grocery_food | 97.32% | toys_games | 92.67% |
| health_wellness | 81.32% | video_games_gaming | 93.47% |
| home_decor_lighting | 98.04% | watches | 94.16% |
| home_furniture | 94.08% | industrial_commercial | 94.92% |
Limitations
Known Issues
Fashion Categories: Lower F1 scores for
fashion_accessories(70.5%) andshoes_footwear(77.1%) due to semantic overlap withfashion_clothingElectronics vs Computers: Some confusion between
electronicsandcomputers_networking- both are technology productsAppliance Categories:
small_appliancesandlarge_appliancesshow overlap withkitchen_diningSports & Health:
sports_outdoors(71.0%) andhealth_wellness(81.3%) show confusion due to overlapping products (e.g., fitness equipment)Noise Category: The "none" category has only 8 test samples (0.2% of dataset) and shows 25% recall - insufficient training data
Recommendations for Improvement
- Collect more data for underperforming categories
- Hierarchical classification for fashion (parent: fashion โ children: clothing, accessories, shoes)
- Multi-label classification for products that fit multiple categories
- Add product attributes (brand, price range) as additional features
- Balanced sampling to ensure equal representation
Bias and Fairness
Dataset Bias
- Synthetic data: Generated descriptions may not capture all real-world variations
- Seller persona bias: Distribution (40% individual, 30% reseller, 15% professional, 15% minimal) reflects common marketplace patterns but may not represent all platforms
- Language: English-only - not suitable for multilingual e-commerce
Mitigation Strategies
- Stratified sampling ensures balanced category representation
- Multiple seller personas provide variation in writing styles
- Regular evaluation on real-world data recommended
Environmental Impact
- Hardware: Google Colab T4 GPU
- Training time: 45 minutes
- Estimated COโ emissions: ~0.02 kg COโeq (using ML COโ Impact calculator)
- Considerations: Using pre-trained XLM-RoBERTa reduces environmental cost vs training from scratch
Citation
If you use this model, please cite:
@misc{xlm-roberta-ecommerce-2025,
author = {Your Name},
title = {XLM-RoBERTa E-Commerce Product Classifier},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/Lezh1n/xlm-roberta-ecommerce-classifier}}
}
Acknowledgments
- Base model: FacebookAI/xlm-roberta-base
- Dataset: Custom synthetic e-commerce product descriptions
- Framework: Hugging Face Transformers
- Training: Google Colab
License
This model is released under the MIT License.
The base model (XLM-RoBERTa) is licensed under MIT. See the original model card for details.
- Downloads last month
- 27
Model tree for Lezh1n/xlm-roberta-ecommerce-classifier
Base model
FacebookAI/xlm-roberta-baseEvaluation results
- Accuracy on E-Commerce Product Classificationself-reported0.901
- F1 Score (Weighted) on E-Commerce Product Classificationself-reported0.900
- Precision (Weighted) on E-Commerce Product Classificationself-reported0.903
- Recall (Weighted) on E-Commerce Product Classificationself-reported0.901