|
|
--- |
|
|
library_name: transformers |
|
|
tags: [fake-news-detection, NLP, classification, transformers, DistilBERT] |
|
|
--- |
|
|
|
|
|
# Model Card for Fake News Detection Model |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Finetuned from:** `distilbert-base-uncased` |
|
|
- **Language:** English |
|
|
- **Model type:** Transformer-based text classification model |
|
|
- **License:** MIT |
|
|
- **Intended Use:** Fake news detection on social media and news websites |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id) |
|
|
- **Paper (if applicable):** N/A |
|
|
- **Demo (if applicable):** N/A |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- This model can be used to detect whether a given news article is **real or fake**. |
|
|
- It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools. |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
- Can be further fine-tuned on domain-specific fake news datasets. |
|
|
- Useful for media companies, journalists, and researchers studying misinformation. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- This model is **not designed for generating news content**. |
|
|
- It may not work well for languages other than English. |
|
|
- Not suitable for fact-checking complex claims requiring external knowledge. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
### Risks |
|
|
|
|
|
- The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training. |
|
|
- There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**. |
|
|
- Model performance can degrade on out-of-distribution samples. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
- Users should **not rely solely** on this model for determining truthfulness. |
|
|
- It is recommended to **use human verification** and **cross-check information** from multiple sources. |
|
|
|
|
|
## How to Use the Model |
|
|
|
|
|
You can load the model using `transformers` and use it for inference as shown below: |
|
|
|
|
|
```python |
|
|
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification |
|
|
import torch |
|
|
|
|
|
tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id") |
|
|
model = DistilBertForSequenceClassification.from_pretrained("your-model-id") |
|
|
|
|
|
def predict(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) |
|
|
outputs = model(**inputs) |
|
|
probs = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
return "Fake News" if torch.argmax(probs) == 1 else "Real News" |
|
|
|
|
|
text = "Breaking: Scientists discover a new element!" |
|
|
print(predict(text)) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Preprocessing:** |
|
|
- Tokenization using `DistilBertTokenizerFast` |
|
|
- Removal of stop words and punctuation |
|
|
- Converting text to lowercase |
|
|
|
|
|
- **Training Configuration:** |
|
|
- **Model:** `distilbert-base-uncased` |
|
|
- **Optimizer:** AdamW |
|
|
- **Batch size:** 16 |
|
|
- **Epochs:** 3 |
|
|
- **Learning rate:** 2e-5 |
|
|
|
|
|
### Compute Resources |
|
|
|
|
|
- **Hardware:** NVIDIA Tesla T4 (Google Colab) |
|
|
- **Training Time:** ~2 hours |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data |
|
|
|
|
|
- The model was evaluated on a held-out test set of **10,000 news articles**. |
|
|
|
|
|
### Metrics |
|
|
|
|
|
- **Accuracy:** 92% |
|
|
- **F1 Score:** 90% |
|
|
- **Precision:** 91% |
|
|
- **Recall:** 89% |
|
|
|
|
|
### Results |
|
|
|
|
|
| Metric | Score | |
|
|
|----------|-------| |
|
|
| Accuracy | 92% | |
|
|
| F1 Score | 90% | |
|
|
| Precision | 91% | |
|
|
| Recall | 89% | |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Used:** NVIDIA Tesla T4 |
|
|
- **Total Compute Time:** ~2 hours |
|
|
- **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute) |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy. |
|
|
|
|
|
### Dependencies |
|
|
|
|
|
- `transformers` |
|
|
- `torch` |
|
|
- `datasets` |
|
|
- `scikit-learn` |
|
|
|