Model Card for Fake News Detection Model

Model Summary

This is a fine-tuned RoBERTA model for fake news detection. It classifies news articles as either real or fake based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.

Model Details

Model Description

Developed by: abd8433
Finetuned from: Roberta-base-uncased
Language: English
Model type: Transformer-based text classification model
License: MIT
Intended Use: Fake news detection on social media and news websites

Model Sources

Repository: Hugging Face Model Hub
Paper (if applicable): N/A
Demo (if applicable): N/A

Uses

Direct Use

This model can be used to detect whether a given news article is real or fake.
It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.

Downstream Use

Can be further fine-tuned on domain-specific fake news datasets.
Useful for media companies, journalists, and researchers studying misinformation.

Out-of-Scope Use

This model is not designed for generating news content.
It may not work well for languages other than English.
Not suitable for fact-checking complex claims requiring external knowledge.

Bias, Risks, and Limitations

Risks

The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
There is a possibility of false positives (real news misclassified as fake) or false negatives (fake news classified as real).
Model performance can degrade on out-of-distribution samples.

Recommendations

Users should not rely solely on this model for determining truthfulness.
It is recommended to use human verification and cross-check information from multiple sources.

How to Use the Model

Training Details

Training Data

The model was trained on a dataset consisting of news articles labeled as real or fake. The dataset includes information from reputable sources and misinformation websites.

Training Procedure

Preprocessing:
- Tokenization using RoBertaTokenizerFast
- Removal of stop words and punctuation
- Converting text to lowercase
Training Configuration:
- Model: Roberta-base-uncased
- Optimizer: AdamW
- Batch size: 16
- Epochs: 3
- Learning rate: 2e-5

Compute Resources

Hardware: NVIDIA Tesla T4 (Google Colab)
Training Time: ~2 hours

Evaluation

Testing Data

The model was evaluated on a held-out test set of 10,000 news articles.

Metrics

Accuracy: 92%
F1 Score: 90%
Precision: 91%
Recall: 89%

Results

Metric	Score
Accuracy	92%
F1 Score	90%
Precision	91%
Recall	89%

Environmental Impact

Hardware Used: NVIDIA Tesla T4
Total Compute Time: ~2 hours
Carbon Emissions: Estimated using the ML Impact Calculator

Technical Specifications

Model Architecture

The model is based on Roberta, a lightweight transformer architecture that reduces computation while retaining accuracy.

Dependencies

transformers
torch
datasets
scikit-learn

Downloads last month: 7

Safetensors

Model size

67M params

Tensor type

F32

abd8433
/

TRAK-fake-Detection-roberta