smart-category-detector-v1
Lightweight Transformer-based classifier for short text categorization (60 categories)
smart-category-detector-v1 is a multi-class text
classification model designed to predict one of 60 categories from
short text inputs such as:
- product titles
- marketplace listings
- event announcements
- restaurant menu items with prices
The model is optimized for fast inference and practical categorization tasks.
Model Details
Field Value
Developer Salim Al Sazu
Hugging Face salimalsazu
Model Type DistilBERT Fine-tuned
Base Model distilbert-base-uncased
Task Multi-class Text Classification
Categories 60
Training Samples ~600,000
Language English
License MIT
Training Dataset
The model was trained on a curated dataset of approximately 600,000 samples containing short text entries mapped to one of 60 categories.
Dataset Structure
Column Description
text Short input text category Target classification label
Example:
Samsung Galaxy S24 Ultra 512GB Mobile -> smartphones
Dhaka International Book Fair 2026 -> book_fair
Jamboo Burger Tk 220 -> burgers
Lenovo ThinkPad X1 Carbon Laptop -> laptops
Chicken Biryani Tk 180 -> biryani
Dataset Distribution
The dataset is divided into three main groups, each containing 20 categories.
Event Categories (20) --- ~200,000 samples
sports
music
tech_conference
education_seminar
business_summit
startup_pitch
job_fair
art_exhibition
cultural_festival
religious_event
political_rally
charity_event
workshop
webinar
networking_event
book_fair
food_festival
fashion_show
award_ceremony
hackathon
Product Categories (20) --- ~200,000 samples
electronics
smartphones
laptops
fashion_clothing
shoes
beauty_cosmetics
grocery_food
furniture
home_appliances
kitchen_items
sports_equipment
books
toys
baby_products
health_supplements
automotive
gaming
jewelry
office_supplies
pet_products
Restaurant / Menu Categories (20) --- ~200,000 samples
burgers
pizza
sandwich_wraps
fries_sides
fried_snacks
street_food
biryani
rice_dishes
noodles_pasta
curries
bbq_grill
seafood_dishes
breakfast_items
soups_salads
cakes_pastries
ice_cream
traditional_sweets
coffee_tea
soft_drinks
shakes_smoothies
Example Predictions
Input Prediction
Samsung Galaxy S24 Ultra 512GB Mobile smartphones Dhaka International Book Fair 2026 book_fair Jamboo Burger Tk 220 burgers HP Pavilion RTX 4060 Gaming Laptop laptops Beef Burger Combo Tk 350 burgers
Quick Start
from transformers import pipeline
clf = pipeline(
"text-classification",
model="salimalsazu/smart-category-detector-v1",
top_k=5
)
print(clf("Samsung Galaxy S24 Ultra 512GB Mobile"))
print(clf("Dhaka International Book Fair 2026"))
print(clf("Jamboo Burger Tk 220"))
Limitations
- Works best with short text
- Designed primarily for English text
- Mixed-language inputs may reduce accuracy
- Limited to 60 predefined categories
- Unknown categories may be mapped to the closest label
Future Improvements
Potential improvements include:
- adding real-world marketplace data
- improving Bangla language support
- increasing spelling robustness
- adding an "unknown" label
- publishing benchmark evaluation metrics
Citation
Salim Al Sazu
smart-category-detector-v1
- Downloads last month
- 153
Model tree for salimalsazu/smart-category-detector-v1
Unable to build the model tree, the base model loops to the model itself. Learn more.