File size: 4,269 Bytes
7d78d9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
library_name: transformers
tags: [fake-news-detection, NLP, classification, transformers, DistilBERT]
---

# Model Card for Fake News Detection Model

## Model Summary

This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.

## Model Details

### Model Description

- **Finetuned from:** `distilbert-base-uncased`
- **Language:** English
- **Model type:** Transformer-based text classification model
- **License:** MIT
- **Intended Use:** Fake news detection on social media and news websites

### Model Sources

- **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id)
- **Paper (if applicable):** N/A
- **Demo (if applicable):** N/A

## Uses

### Direct Use

- This model can be used to detect whether a given news article is **real or fake**.
- It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.

### Downstream Use

- Can be further fine-tuned on domain-specific fake news datasets.
- Useful for media companies, journalists, and researchers studying misinformation.

### Out-of-Scope Use

- This model is **not designed for generating news content**.
- It may not work well for languages other than English.
- Not suitable for fact-checking complex claims requiring external knowledge.

## Bias, Risks, and Limitations

### Risks

- The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
- There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**.
- Model performance can degrade on out-of-distribution samples.

### Recommendations

- Users should **not rely solely** on this model for determining truthfulness.
- It is recommended to **use human verification** and **cross-check information** from multiple sources.

## How to Use the Model

You can load the model using `transformers` and use it for inference as shown below:

```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
model = DistilBertForSequenceClassification.from_pretrained("your-model-id")

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    return "Fake News" if torch.argmax(probs) == 1 else "Real News"

text = "Breaking: Scientists discover a new element!"
print(predict(text))
```

## Training Details

### Training Data

The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites.

### Training Procedure

- **Preprocessing:**
  - Tokenization using `DistilBertTokenizerFast`
  - Removal of stop words and punctuation
  - Converting text to lowercase

- **Training Configuration:**
  - **Model:** `distilbert-base-uncased`
  - **Optimizer:** AdamW
  - **Batch size:** 16
  - **Epochs:** 3
  - **Learning rate:** 2e-5

### Compute Resources

- **Hardware:** NVIDIA Tesla T4 (Google Colab)
- **Training Time:** ~2 hours

## Evaluation

### Testing Data

- The model was evaluated on a held-out test set of **10,000 news articles**.

### Metrics

- **Accuracy:** 92%
- **F1 Score:** 90%
- **Precision:** 91%
- **Recall:** 89%

### Results

| Metric   | Score |
|----------|-------|
| Accuracy | 92%   |
| F1 Score | 90%   |
| Precision | 91%  |
| Recall   | 89%   |

## Environmental Impact

- **Hardware Used:** NVIDIA Tesla T4
- **Total Compute Time:** ~2 hours
- **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute)

## Technical Specifications

### Model Architecture

- The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy.

### Dependencies

- `transformers`
- `torch`
- `datasets`
- `scikit-learn`