marksverdhei/clickbait_title_classification
Viewer β’ Updated β’ 32k β’ 268 β’ 6
How to use ENTUM-AI/roberta-clickbait-classifier with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="ENTUM-AI/roberta-clickbait-classifier") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ENTUM-AI/roberta-clickbait-classifier")
model = AutoModelForSequenceClassification.from_pretrained("ENTUM-AI/roberta-clickbait-classifier")# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ENTUM-AI/roberta-clickbait-classifier")
model = AutoModelForSequenceClassification.from_pretrained("ENTUM-AI/roberta-clickbait-classifier")A clickbait detection model built on RoBERTa-base (125M parameters), fine-tuned on multiple combined and deduplicated English datasets.
from transformers import pipeline
classifier = pipeline("text-classification", model="ENTUM-AI/roberta-clickbait-classifier")
# Clickbait
result = classifier("You Won't BELIEVE What This Celebrity Did Next!")
print(result) # [{'label': 'Clickbait', 'score': 0.99...}]
# Non-Clickbait
result = classifier("Federal Reserve raises interest rates by 0.25 percentage points")
print(result) # [{'label': 'Non-Clickbait', 'score': 0.99...}]
| Architecture | RoBERTa-base (125M parameters) |
| Task | Binary text classification |
| Labels | Clickbait (1), Non-Clickbait (0) |
| Language | English |
| License | Apache 2.0 |
| Max input length | 128 tokens |
Three public English clickbait datasets, combined and deduplicated:
| Dataset | Source |
|---|---|
| christinacdl/Clickbait_New | 58.6K samples from multiple sources |
| marksverdhei/clickbait_title_classification | 32K samples (Chakraborty et al., ASONAM 2016) |
| contemmcm/clickbait | 26K samples |
After deduplication and balancing: ~48K samples (train/val/test split 85/10/5).
Fine-tuned with HuggingFace Trainer using linear LR schedule with warmup, AdamW optimizer, and early stopping on F1 score.
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="ENTUM-AI/roberta-clickbait-classifier")