Text Classification
Transformers
Safetensors
distilbert
fake-news-detection
NLP
classification
RoBERTA
text-embeddings-inference
Instructions to use abd8433/TRAK-fake-Detection-roberta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use abd8433/TRAK-fake-Detection-roberta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="abd8433/TRAK-fake-Detection-roberta")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("abd8433/TRAK-fake-Detection-roberta") model = AutoModelForSequenceClassification.from_pretrained("abd8433/TRAK-fake-Detection-roberta") - Notebooks
- Google Colab
- Kaggle
Model Card for Fake News Detection Model
Model Summary
This is a fine-tuned RoBERTA model for fake news detection. It classifies news articles as either real or fake based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.
Model Details
Model Description
- Developed by: abd8433
- Finetuned from:
Roberta-base-uncased - Language: English
- Model type: Transformer-based text classification model
- License: MIT
- Intended Use: Fake news detection on social media and news websites
Model Sources
- Repository: Hugging Face Model Hub
- Paper (if applicable): N/A
- Demo (if applicable): N/A
Uses
Direct Use
- This model can be used to detect whether a given news article is real or fake.
- It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.
Downstream Use
- Can be further fine-tuned on domain-specific fake news datasets.
- Useful for media companies, journalists, and researchers studying misinformation.
Out-of-Scope Use
- This model is not designed for generating news content.
- It may not work well for languages other than English.
- Not suitable for fact-checking complex claims requiring external knowledge.
Bias, Risks, and Limitations
Risks
- The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
- There is a possibility of false positives (real news misclassified as fake) or false negatives (fake news classified as real).
- Model performance can degrade on out-of-distribution samples.
Recommendations
- Users should not rely solely on this model for determining truthfulness.
- It is recommended to use human verification and cross-check information from multiple sources.
How to Use the Model
Training Details
Training Data
The model was trained on a dataset consisting of news articles labeled as real or fake. The dataset includes information from reputable sources and misinformation websites.
Training Procedure
Preprocessing:
- Tokenization using
RoBertaTokenizerFast - Removal of stop words and punctuation
- Converting text to lowercase
- Tokenization using
Training Configuration:
- Model:
Roberta-base-uncased - Optimizer: AdamW
- Batch size: 16
- Epochs: 3
- Learning rate: 2e-5
- Model:
Compute Resources
- Hardware: NVIDIA Tesla T4 (Google Colab)
- Training Time: ~2 hours
Evaluation
Testing Data
- The model was evaluated on a held-out test set of 10,000 news articles.
Metrics
- Accuracy: 92%
- F1 Score: 90%
- Precision: 91%
- Recall: 89%
Results
| Metric | Score |
|---|---|
| Accuracy | 92% |
| F1 Score | 90% |
| Precision | 91% |
| Recall | 89% |
Environmental Impact
- Hardware Used: NVIDIA Tesla T4
- Total Compute Time: ~2 hours
- Carbon Emissions: Estimated using the ML Impact Calculator
Technical Specifications
Model Architecture
- The model is based on Roberta, a lightweight transformer architecture that reduces computation while retaining accuracy.
Dependencies
transformerstorchdatasetsscikit-learn
- Downloads last month
- -