🎯 RoBERTa Clickbait Classifier

A clickbait detection model built on RoBERTa-base (125M parameters), fine-tuned on multiple combined and deduplicated English datasets.

πŸš€ Quick Start

from transformers import pipeline

classifier = pipeline("text-classification", model="ENTUM-AI/roberta-clickbait-classifier")

# Clickbait
result = classifier("You Won't BELIEVE What This Celebrity Did Next!")
print(result)  # [{'label': 'Clickbait', 'score': 0.99...}]

# Non-Clickbait
result = classifier("Federal Reserve raises interest rates by 0.25 percentage points")
print(result)  # [{'label': 'Non-Clickbait', 'score': 0.99...}]

Model Details

Architecture RoBERTa-base (125M parameters)
Task Binary text classification
Labels Clickbait (1), Non-Clickbait (0)
Language English
License Apache 2.0
Max input length 128 tokens

πŸ“Š Training Data

Three public English clickbait datasets, combined and deduplicated:

Dataset Source
christinacdl/Clickbait_New 58.6K samples from multiple sources
marksverdhei/clickbait_title_classification 32K samples (Chakraborty et al., ASONAM 2016)
contemmcm/clickbait 26K samples

After deduplication and balancing: ~48K samples (train/val/test split 85/10/5).

βš™οΈ Training

Fine-tuned with HuggingFace Trainer using linear LR schedule with warmup, AdamW optimizer, and early stopping on F1 score.

πŸ’‘ Use Cases

  • News aggregators β€” filter low-quality clickbait articles
  • Social media β€” content moderation and feed quality scoring
  • Browser extensions β€” warn users about clickbait headlines
  • Email filters β€” detect clickbait-style subject lines
  • Content platforms β€” automated content quality assessment

⚠️ Limitations

  • English only
  • Optimized for short texts (headlines, titles, tweets); longer texts will be truncated to 128 tokens
  • Reflects patterns and biases present in the training data sources
Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train ENTUM-AI/roberta-clickbait-classifier