ENTUM-AI
/

distilbert-clickbait-classifier

Text Classification

Model card Files Files and versions

distilbert-clickbait-classifier / README.md

ENTUM-AI's picture

Initial upload of DistilBERT Clickbait Classifier

a2ddafe verified 10 days ago

|

history blame contribute delete

2.9 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-classification
	- distilbert
	- clickbait
	- moderation
	datasets:
	- marksverdhei/clickbait_title_classification
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# Clickbait Classifier 🎣

	This model is a fine-tuned version of `distilbert-base-uncased` trained to classify text (news headlines, article titles, video names) into two categories: Clickbait and Non-Clickbait.

	It is optimized for filtering out sensationalist headlines and improving content recommendation algorithms.

	## Intended Use

	The primary goal of this model is to automatically detect clickbait titles to help users and platforms prioritize high-quality informative content over misleading or exaggerated headlines.

	- Input: Raw English text (headlines, titles, tweets).
	- Return: A binary classification label (`Clickbait` or `Non-Clickbait`) with a confidence score.

	## Training Data

	The model was fine-tuned using the `bhargavasthet/clickbait_dataset`, which contains a balanced collection of headlines explicitly labeled as clickbait (e.g., from Buzzfeed, Upworthy) and non-clickbait (e.g., from Reuters, The New York Times).

	## Performance Metrics

	The model achieved excellent performance on the `marksverdhei/clickbait_title_classification` validation set:

	- Accuracy: `0.9864` (98.6%)
	- F1 Score: `0.9862` (98.6%)
	- Precision: `0.9867` (98.6%)
	- Recall: `0.9857` (98.5%)
	- Evaluation Loss: `0.0488`

	## Training Constraints & Hyperparameters

	The model was trained under the following conditions:
	- Base Architecture: `distilbert-base-uncased` (chosen for speed and efficiency)
	- Maximum Sequence Length: 128
	- Learning Rate: 2e-05
	- Batch Size: 64
	- Precision: Mixed Precision (fp16)
	- Optimizer Strategy: Early Stopping (patience=3)
	- Epochs: 3

	## Usage 🚀

	You can easily integrate this model into your applications using the Hugging Face `transformers` library pipeline:

	```python
	from transformers import pipeline

	# Load the clickbait classifier
	classifier = pipeline("text-classification", model="ENTUM-AI/distilbert-clickbait-classifier")

	# Test with a sensational headline
	text_1 = "10 Bizarre Facts About Apples That Will BLOW YOUR MIND! 🍎🤯"
	result_1 = classifier(text_1)
	print(f"Text: '{text_1}'\nPrediction: {result_1}\n")

	# Test with a normal news headline
	text_2 = "Apple releases new quarterly earnings report showing 5% growth."
	result_2 = classifier(text_2)
	print(f"Text: '{text_2}'\nPrediction: {result_2}")
	```

	## Expected Output format:
	```json
	[{'label': 'Clickbait', 'score': 0.9921}]
	```

	## Potential Applications
	- 📰 News Aggregators: Filter out low-quality clickbait articles.
	- 📱 Social Media Feeds: Demote clickbait posts in recommendation algorithms.
	- ✉️ Email Spam Filters: Detect clickbait-style subject lines in promotional emails.