BERT Spanish Clickbait Classifier
Fine-tuned BERT model for detecting clickbait in Spanish news articles.
The model performs binary text classification to determine whether a news item uses clickbait techniques or not.
Model Details
Model Description
This model is a fine-tuned version of dccuchile/bert-base-spanish-wwm-cased, adapted for the task of clickbait detection in Spanish news articles.
The classification is based on the title of the news, allowing the model to capture both lexical and contextual cues commonly associated with clickbait content.
- Developed by: Julen Neila
- Shared by: Julen Neila
- Model type: Transformer-based text classifier (BERT)
- Language(s): Spanish
- License: Apache 2.0
- Finetuned from model: dccuchile/bert-base-spanish-wwm-cased
Model Sources
- Base model: dccuchile/bert-base-spanish-wwm-cased
- Framework: Hugging Face Transformers
Uses
Direct Use
The model can be directly used to:
- Detect clickbait in Spanish news headlines and articles
- Support media analysis and journalism studies
- Assist in content moderation and media monitoring pipelines
Downstream Use
The model can be integrated into:
- News aggregation systems
- Media bias and clickbait analysis
- Academic NLP research projects
- Larger information extraction or classification pipelines
Out-of-Scope Use
The model is not recommended for:
- Social media posts or informal text
- Non-Spanish content
- Legal, medical, or high-stakes decision-making systems
Bias, Risks, and Limitations
- The model reflects biases present in the training data.
- It may underperform on very short texts or headlines without sufficient context.
- It may not generalize well to domains outside traditional digital journalism.
Recommendations
Users should be aware of these limitations and avoid deploying the model in high-impact decision-making contexts without additional validation.
How to Get Started with the Model
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="JJNeila/bert-spanish-clickbait-oss",
tokenizer="JJNeila/bert-spanish-clickbait-oss"
)
classifier("Estados Unidos entrena a 25.000 militares (1.400 españoles) para defender el este de Europa")
## Training Details
### Training Data
The model was trained on a curated dataset of Spanish news articles annotated for clickbait presence.
- **Size:** ~3,163 labeled samples
- **Labels:**
- `0` → Non-clickbait
- `1` → Clickbait
The input format used during training was:
*title*
### Training Procedure
#### Preprocessing
- Removal of unlabeled samples
- Concatenation of title and article text
- Tokenization using the base BERT Spanish tokenizer
- Maximum sequence length: **512 tokens**
#### Training Hyperparameters
- **Training regime:** fp16 mixed precision
- **Optimizer:** AdamW
- **Learning rate:** 2e-5
- **Batch size:** 8
- **Epochs:** 3
- **Weight decay:** 0.01
- **Evaluation metric for model selection:** eval_loss
- **EarlyStoppingCallback**
#### Speeds, Sizes, Times
- **Training time:** ~0,5 hours
- **Hardware:** NVIDIA T4 GPU
- **Final model size:** ~440 MB
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data, Factors & Metrics
#### Testing Data
A held-out validation set (20%) stratified by class labels.
#### Metrics
The following metrics were used due to class imbalance considerations:
- Accuracy
- Precision
- Recall
- F1-score
### Results
| Metric | Value |
|-----------|-------|
| Accuracy | 0.86 |
| Precision | 0.84 |
| Recall | 0.88 |
| F1-score | 0.86 |
#### Summary
The model achieves a strong balance between precision and recall, making it particularly effective at identifying clickbait content without excessive false positives.
---
## Environmental Impact
- **Hardware Type:** NVIDIA T4 GPU
- **Hours used:** ~0,5 hours
- **Cloud Provider:** Google Colab
- **Compute Region:** Europe
- **Carbon Emitted:** Not explicitly measured
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
## Technical Specifications
### Model Architecture and Objective
- **Architecture:** BERT-base (12 layers, ~110M parameters)
- **Objective:** Binary cross-entropy loss for text classification
### Compute Infrastructure
[More Information Needed]
#### Hardware
- NVIDIA T4 GPU (16 GB VRAM)
#### Software
- Python 3.12
- PyTorch
- Transformers
- Hugging Face Datasets
**BibTeX:**
```bibtex
@misc{neila2025clickbait,
title={BERT Spanish Clickbait Classifier},
author={Neila, Julen},
year={2026},
publisher={Hugging Face}
}
## Model Card Authors
**Julen Neila Garcia**
## Model Card Contact
https://huggingface.co/JJNeila
- Downloads last month
- 15
Paper for JJNeila/bert-spanish-clickbait-oss
Paper
•
1910.09700
•
Published
•
29