Fake News Detection - BERT (LIAR Dataset)

This model is a fine-tuned version of bert-base-uncased for fake news detection.

Model Description

  • Base Model: BERT (bert-base-uncased)
  • Task: Binary text classification (Real vs Fake news)
  • Training Dataset: LIAR dataset (converted to binary classification)
  • Performance: 80% accuracy on real news detection

Training Details

  • Training Dataset: LIAR dataset (fact-checked political statements)
  • Epochs: 3
  • Batch Size: 16
  • Learning Rate: 1e-5
  • Max Sequence Length: 128

Performance

Test Set Results

  • Accuracy: 60.86%
  • F1 Score: 71.80%
  • Precision: 60.54%
  • Recall: 88.22%

Real News Detection

  • Reuters/BBC Style Accuracy: 80%
  • Successfully identifies legitimate news articles from major outlets

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "naheelkk/fake-news-bert-liar"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prediction function
def predict_fake_news(text):
    inputs = tokenizer(text, truncation=True, max_length=128, 
                      padding=True, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=-1)
        
        prediction = torch.argmax(probs, dim=-1).item()
        confidence = probs.max().item()
        
        label = "REAL" if prediction == 1 else "FAKE"
        return label, confidence

# Example usage
text = "The Federal Reserve announced an interest rate increase today."
prediction, confidence = predict_fake_news(text)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")

Training Data

The model was trained on the LIAR dataset, which contains fact-checked political statements:

  • True statements (true, mostly-true, half-true) → labeled as REAL
  • False statements (barely-true, false, pants-on-fire) → labeled as FAKE

Limitations

  • Primarily trained on political statements
  • May have bias towards certain topics or writing styles
  • Confidence scores below 0.55 should be treated as uncertain
  • Best performance on English news articles

Ethical Considerations

This model should be used as a tool to assist human fact-checkers, not replace them. Always verify important information through multiple reliable sources.

Citation

If you use this model, please consider citing the LIAR dataset:

@inproceedings{wang2017liar,
  title={LIAR: A benchmark dataset for fake news detection},
  author={Wang, William Yang},
  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
  year={2017}
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naheelkk/fake-news-bert-liar

Finetuned
(6670)
this model

Dataset used to train naheelkk/fake-news-bert-liar