Fake News Detection - BERT (LIAR Dataset)

This model is a fine-tuned version of bert-base-uncased for fake news detection.

Model Description

Base Model: BERT (bert-base-uncased)
Task: Binary text classification (Real vs Fake news)
Training Dataset: LIAR dataset (converted to binary classification)
Performance: 80% accuracy on real news detection

Training Details

Training Dataset: LIAR dataset (fact-checked political statements)
Epochs: 3
Batch Size: 16
Learning Rate: 1e-5
Max Sequence Length: 128

Performance

Test Set Results

Accuracy: 60.86%
F1 Score: 71.80%
Precision: 60.54%
Recall: 88.22%

Real News Detection

Reuters/BBC Style Accuracy: 80%
Successfully identifies legitimate news articles from major outlets

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "naheelkk/fake-news-bert-liar"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prediction function
def predict_fake_news(text):
    inputs = tokenizer(text, truncation=True, max_length=128, 
                      padding=True, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=-1)
        
        prediction = torch.argmax(probs, dim=-1).item()
        confidence = probs.max().item()
        
        label = "REAL" if prediction == 1 else "FAKE"
        return label, confidence

# Example usage
text = "The Federal Reserve announced an interest rate increase today."
prediction, confidence = predict_fake_news(text)
print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")

Training Data

The model was trained on the LIAR dataset, which contains fact-checked political statements:

True statements (true, mostly-true, half-true) → labeled as REAL
False statements (barely-true, false, pants-on-fire) → labeled as FAKE

Limitations

Primarily trained on political statements
May have bias towards certain topics or writing styles
Confidence scores below 0.55 should be treated as uncertain
Best performance on English news articles

Ethical Considerations

This model should be used as a tool to assist human fact-checkers, not replace them. Always verify important information through multiple reliable sources.

Citation

If you use this model, please consider citing the LIAR dataset:

@inproceedings{wang2017liar,
  title={LIAR: A benchmark dataset for fake news detection},
  author={Wang, William Yang},
  booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
  year={2017}
}

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for naheelkk/fake-news-bert-liar

Base model

google-bert/bert-base-uncased

Finetuned

(6670)

this model

naheelkk
/

fake-news-bert-liar