Fake News Detection - BERT (LIAR Dataset)
This model is a fine-tuned version of bert-base-uncased for fake news detection.
Model Description
- Base Model: BERT (bert-base-uncased)
- Task: Binary text classification (Real vs Fake news)
- Training Dataset: LIAR dataset (converted to binary classification)
- Performance: 80% accuracy on real news detection
Training Details
- Training Dataset: LIAR dataset (fact-checked political statements)
- Epochs: 3
- Batch Size: 16
- Learning Rate: 1e-5
- Max Sequence Length: 128
Performance
Test Set Results
- Accuracy: 60.86%
- F1 Score: 71.80%
- Precision: 60.54%
- Recall: 88.22%
Real News Detection
- Reuters/BBC Style Accuracy: 80%
- Successfully identifies legitimate news articles from major outlets
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "naheelkk/fake-news-bert-liar"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prediction function
def predict_fake_news(text):
inputs = tokenizer(text, truncation=True, max_length=128,
padding=True, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=-1)
prediction = torch.argmax(probs, dim=-1).item()
confidence = probs.max().item()
label = "REAL" if prediction == 1 else "FAKE"
return label, confidence
# Example usage
text = "The Federal Reserve announced an interest rate increase today."
prediction, confidence = predict_fake_news(text)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")
Training Data
The model was trained on the LIAR dataset, which contains fact-checked political statements:
- True statements (true, mostly-true, half-true) → labeled as REAL
- False statements (barely-true, false, pants-on-fire) → labeled as FAKE
Limitations
- Primarily trained on political statements
- May have bias towards certain topics or writing styles
- Confidence scores below 0.55 should be treated as uncertain
- Best performance on English news articles
Ethical Considerations
This model should be used as a tool to assist human fact-checkers, not replace them. Always verify important information through multiple reliable sources.
Citation
If you use this model, please consider citing the LIAR dataset:
@inproceedings{wang2017liar,
title={LIAR: A benchmark dataset for fake news detection},
author={Wang, William Yang},
booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
year={2017}
}
- Downloads last month
- 2
Model tree for naheelkk/fake-news-bert-liar
Base model
google-bert/bert-base-uncased