smart-category-detector-v1

Lightweight Transformer-based classifier for short text categorization (60 categories)

smart-category-detector-v1 is a multi-class text classification model designed to predict one of 60 categories from short text inputs such as:

  • product titles
  • marketplace listings
  • event announcements
  • restaurant menu items with prices

The model is optimized for fast inference and practical categorization tasks.


Model Details

Field Value


Developer Salim Al Sazu Hugging Face salimalsazu Model Type DistilBERT Fine-tuned Base Model distilbert-base-uncased Task Multi-class Text Classification Categories 60 Training Samples ~600,000 Language English License MIT


Training Dataset

The model was trained on a curated dataset of approximately 600,000 samples containing short text entries mapped to one of 60 categories.

Dataset Structure

Column Description


text Short input text category Target classification label

Example:

Samsung Galaxy S24 Ultra 512GB Mobile -> smartphones
Dhaka International Book Fair 2026 -> book_fair
Jamboo Burger Tk 220 -> burgers
Lenovo ThinkPad X1 Carbon Laptop -> laptops
Chicken Biryani Tk 180 -> biryani


Dataset Distribution

The dataset is divided into three main groups, each containing 20 categories.

Event Categories (20) --- ~200,000 samples

sports
music
tech_conference
education_seminar
business_summit
startup_pitch
job_fair
art_exhibition
cultural_festival
religious_event
political_rally
charity_event
workshop
webinar
networking_event
book_fair
food_festival
fashion_show
award_ceremony
hackathon


Product Categories (20) --- ~200,000 samples

electronics
smartphones
laptops
fashion_clothing
shoes
beauty_cosmetics
grocery_food
furniture
home_appliances
kitchen_items
sports_equipment
books
toys
baby_products
health_supplements
automotive
gaming
jewelry
office_supplies
pet_products


Restaurant / Menu Categories (20) --- ~200,000 samples

burgers
pizza
sandwich_wraps
fries_sides
fried_snacks
street_food
biryani
rice_dishes
noodles_pasta
curries
bbq_grill
seafood_dishes
breakfast_items
soups_salads
cakes_pastries
ice_cream
traditional_sweets
coffee_tea
soft_drinks
shakes_smoothies


Example Predictions

Input Prediction


Samsung Galaxy S24 Ultra 512GB Mobile smartphones Dhaka International Book Fair 2026 book_fair Jamboo Burger Tk 220 burgers HP Pavilion RTX 4060 Gaming Laptop laptops Beef Burger Combo Tk 350 burgers


Quick Start

from transformers import pipeline

clf = pipeline(
    "text-classification",
    model="salimalsazu/smart-category-detector-v1",
    top_k=5
)

print(clf("Samsung Galaxy S24 Ultra 512GB Mobile"))
print(clf("Dhaka International Book Fair 2026"))
print(clf("Jamboo Burger Tk 220"))

Limitations

  • Works best with short text
  • Designed primarily for English text
  • Mixed-language inputs may reduce accuracy
  • Limited to 60 predefined categories
  • Unknown categories may be mapped to the closest label

Future Improvements

Potential improvements include:

  • adding real-world marketplace data
  • improving Bangla language support
  • increasing spelling robustness
  • adding an "unknown" label
  • publishing benchmark evaluation metrics

Citation

Salim Al Sazu
smart-category-detector-v1

Downloads last month
153
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for salimalsazu/smart-category-detector-v1

Unable to build the model tree, the base model loops to the model itself. Learn more.