π― XLM-RoBERTa Clickbait Detector
Model Overview
This model is a fine-tuned version of christinacdl/XLM_RoBERTa-Clickbait-Detection-new trained to classify headlines into Clickbait and Legitimate News categories.
The model achieves state-of-the-art performance on clickbait detection:
| Metric | Value |
|---|---|
| Accuracy | 99.90% |
| F1-Score | 0.9990 |
| Validation Loss | 0.0068 |
π Model Details
- Model Type: Sequence Classification (Binary)
- Base Model: XLM-RoBERTa (Cross-lingual RoBERTa)
- Language: English (with multilingual capabilities via XLM-RoBERTa)
- Task: Clickbait Detection
- Output Classes: 2 (Clickbait, Legitimate News)
- Model Size: ~270M parameters
- License: MIT
π Intended Uses
Primary Use Cases:
- π Automated clickbait detection in news feeds and social media
- π± Browser extensions and browser plugins for user warnings
- π° News aggregator platforms for content filtering
- π€ Content moderation systems for social platforms
- π Media analytics and trend detection
Intended Audience:
- News organizations and publishers
- Social media platforms
- Content moderation teams
- Researchers studying misinformation
- Browser extension developers
β οΈ Limitations
Model-Specific Limitations:
- Language Scope: Optimized for English headlines. While built on XLM-RoBERTa which supports 100+ languages, performance on non-English content may vary significantly
- Domain Bias: Trained on news and media headlines; may not generalize well to other domains (scientific papers, technical blogs, legal documents)
- Context Dependency: Classifies headlines in isolation without full article context
- Emerging Patterns: May struggle with new or evolving clickbait tactics not present in training data
- Sarcasm & Irony: Can be challenged by figurative language and subtle linguistic tricks
Recommendations:
- Use primarily for English-language headlines
- Validate on domain-specific data before production deployment
- Combine with contextual analysis for edge cases
- Monitor performance on new clickbait patterns
- Consider ensemble approaches for critical applications
π Training and Evaluation Data
Dataset Information
- Dataset Type: News headlines with clickbait binary labels
- Language: English
- Train/Eval Split: Not specified
- Preprocessing: Standard tokenization via XLM-RoBERTa tokenizer
Data Characteristics
- Headlines from news sources and social media
- Binary labels: Clickbait (0) and Legitimate News (1)
- Diverse linguistic patterns and sensationalism levels
- Representative of modern digital media language
π οΈ Training Procedure
Training Hyperparameters
| Parameter | Value |
|---|---|
| Base Model | christinacdl/XLM_RoBERTa-Clickbait-Detection-new |
| Learning Rate | 2e-05 |
| Train Batch Size | 32 |
| Eval Batch Size | 32 |
| Gradient Accumulation Steps | 2 |
| Effective Batch Size | 64 |
| Epochs | 2 |
| Optimizer | AdamW (Fused) |
| Optimizer Betas | (0.9, 0.999) |
| Optimizer Epsilon | 1e-08 |
| LR Scheduler | Linear warmup |
| Mixed Precision | Native AMP (FP16) |
| Random Seed | 42 |
Training Optimization Strategy
- Mixed Precision Training: FP16 with Native AMP for memory efficiency
- Gradient Accumulation: 2 steps to simulate larger batch size (64) with memory constraints
- Optimizer: AdamW Fused implementation for faster computation
- Learning Rate Schedule: Linear warmup followed by linear decay
Training Results
| Epoch | Training Loss | Step | Validation Loss | Accuracy | F1 Score |
|---|---|---|---|---|---|
| 1.0 | β | 400 | 0.0067 | 0.9984 | 0.9984 |
| 2.0 | 0.0167 | 800 | 0.0068 | 0.9990 | 0.9990 |
Key Observations:
- Rapid convergence to near-perfect accuracy
- Minimal overfitting (validation loss stable across epochs)
- F1-Score indicates well-balanced precision and recall
- Peak performance achieved at epoch 2
π¦ Framework Versions
| Library | Version |
|---|---|
| Transformers | 4.57.3 |
| PyTorch | 2.9.0+cu126 |
| Datasets | 4.0.0 |
| Tokenizers | 0.22.2 |
π» How to Use
Basic Usage
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification",
model="kesavanguru/XLM_roberta_finetuned")
# Classify a headline
headline = "You Won't Believe What Happened Next! Click Here!"
result = classifier(headline)
print(result)
# Output: [{'label': 'LABEL_0', 'score': 0.9998}]
Advanced Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "kesavanguru/XLM_roberta_finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Batch classification
headlines = [
"Scientists Make Shocking Discovery - You Won't Believe!",
"New Climate Study Released by UN Scientists",
"This One Trick Will Change Your Life Forever"
]
inputs = tokenizer(headlines, padding=True, truncation=True, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)
for headline, pred in zip(headlines, predictions):
label = "Clickbait" if pred.item() == 0 else "Legitimate"
print(f"{headline} β {label}")
π Model Architecture
XLM-RoBERTa Base (270M parameters)
β
[CLS] Token Representation
β
Sequence Classification Head
β
Binary Output (Softmax)
π Performance Analysis
- Accuracy: 99.90% - Excellent for binary classification
- F1-Score: 0.9990 - Indicates balanced precision and recall
- Loss: 0.0068 - Very low validation loss, minimal overfitting
- Training Efficiency: 2 epochs sufficient for convergence
π€ Contributing
Contributions, issues, and feature requests are welcome!
To contribute:
- Open an issue to discuss proposed changes
- Submit a pull request with improvements
- Share feedback on model performance
π Citation
If you use this model in your research or application, please cite:
@model{xlm_roberta_clickbait_2024,
title={XLM-RoBERTa Fine-tuned for Clickbait Detection},
author={Kesavanguru},
year={2024},
publisher={Hugging Face},
howpublished={https://huggingface.co/kesavanguru/XLM_roberta_finetuned}
}
π License
This model is licensed under the MIT License. See LICENSE file for details.
β¨ Acknowledgments
- Built on XLM-RoBERTa by Facebook
- Base model from christinacdl/XLM_RoBERTa-Clickbait-Detection-new
- Developed with Hugging Face Transformers library
Model Card Updated: January 2026 | Last Training: 2 epochs | Status: Production Ready
Developed by Kesavanguru | Model Repository
- Downloads last month
- 11
Model tree for kesavanguru/XLM_roberta_finetuned
Base model
FacebookAI/xlm-roberta-largeEvaluation results
- Accuracy on Clickbait Detection Datasetself-reported0.999
- F1 on Clickbait Detection Datasetself-reported0.999
- Loss on Clickbait Detection Datasetself-reported0.007