File size: 4,269 Bytes
7d78d9d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
library_name: transformers
tags: [fake-news-detection, NLP, classification, transformers, DistilBERT]
---
# Model Card for Fake News Detection Model
## Model Summary
This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.
## Model Details
### Model Description
- **Finetuned from:** `distilbert-base-uncased`
- **Language:** English
- **Model type:** Transformer-based text classification model
- **License:** MIT
- **Intended Use:** Fake news detection on social media and news websites
### Model Sources
- **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id)
- **Paper (if applicable):** N/A
- **Demo (if applicable):** N/A
## Uses
### Direct Use
- This model can be used to detect whether a given news article is **real or fake**.
- It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.
### Downstream Use
- Can be further fine-tuned on domain-specific fake news datasets.
- Useful for media companies, journalists, and researchers studying misinformation.
### Out-of-Scope Use
- This model is **not designed for generating news content**.
- It may not work well for languages other than English.
- Not suitable for fact-checking complex claims requiring external knowledge.
## Bias, Risks, and Limitations
### Risks
- The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
- There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**.
- Model performance can degrade on out-of-distribution samples.
### Recommendations
- Users should **not rely solely** on this model for determining truthfulness.
- It is recommended to **use human verification** and **cross-check information** from multiple sources.
## How to Use the Model
You can load the model using `transformers` and use it for inference as shown below:
```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch
tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
model = DistilBertForSequenceClassification.from_pretrained("your-model-id")
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
return "Fake News" if torch.argmax(probs) == 1 else "Real News"
text = "Breaking: Scientists discover a new element!"
print(predict(text))
```
## Training Details
### Training Data
The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites.
### Training Procedure
- **Preprocessing:**
- Tokenization using `DistilBertTokenizerFast`
- Removal of stop words and punctuation
- Converting text to lowercase
- **Training Configuration:**
- **Model:** `distilbert-base-uncased`
- **Optimizer:** AdamW
- **Batch size:** 16
- **Epochs:** 3
- **Learning rate:** 2e-5
### Compute Resources
- **Hardware:** NVIDIA Tesla T4 (Google Colab)
- **Training Time:** ~2 hours
## Evaluation
### Testing Data
- The model was evaluated on a held-out test set of **10,000 news articles**.
### Metrics
- **Accuracy:** 92%
- **F1 Score:** 90%
- **Precision:** 91%
- **Recall:** 89%
### Results
| Metric | Score |
|----------|-------|
| Accuracy | 92% |
| F1 Score | 90% |
| Precision | 91% |
| Recall | 89% |
## Environmental Impact
- **Hardware Used:** NVIDIA Tesla T4
- **Total Compute Time:** ~2 hours
- **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute)
## Technical Specifications
### Model Architecture
- The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy.
### Dependencies
- `transformers`
- `torch`
- `datasets`
- `scikit-learn`
|