RahulGanapathy commited on
Commit
7d78d9d
·
verified ·
1 Parent(s): 9500e4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +144 -167
README.md CHANGED
@@ -1,167 +1,144 @@
1
- ---
2
- library_name: transformers
3
- tags: [fake-news-detection, NLP, classification, transformers, DistilBERT]
4
- ---
5
-
6
- # Model Card for Fake News Detection Model
7
-
8
- ## Model Summary
9
-
10
- This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- - **Developed by:** Dhruv Pal
17
- - **Finetuned from:** `distilbert-base-uncased`
18
- - **Language:** English
19
- - **Model type:** Transformer-based text classification model
20
- - **License:** MIT
21
- - **Intended Use:** Fake news detection on social media and news websites
22
-
23
- ### Model Sources
24
-
25
- - **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id)
26
- - **Paper (if applicable):** N/A
27
- - **Demo (if applicable):** N/A
28
-
29
- ## Uses
30
-
31
- ### Direct Use
32
-
33
- - This model can be used to detect whether a given news article is **real or fake**.
34
- - It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.
35
-
36
- ### Downstream Use
37
-
38
- - Can be further fine-tuned on domain-specific fake news datasets.
39
- - Useful for media companies, journalists, and researchers studying misinformation.
40
-
41
- ### Out-of-Scope Use
42
-
43
- - This model is **not designed for generating news content**.
44
- - It may not work well for languages other than English.
45
- - Not suitable for fact-checking complex claims requiring external knowledge.
46
-
47
- ## Bias, Risks, and Limitations
48
-
49
- ### Risks
50
-
51
- - The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
52
- - There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**.
53
- - Model performance can degrade on out-of-distribution samples.
54
-
55
- ### Recommendations
56
-
57
- - Users should **not rely solely** on this model for determining truthfulness.
58
- - It is recommended to **use human verification** and **cross-check information** from multiple sources.
59
-
60
- ## How to Use the Model
61
-
62
- You can load the model using `transformers` and use it for inference as shown below:
63
-
64
- ```python
65
- from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
66
- import torch
67
-
68
- tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
69
- model = DistilBertForSequenceClassification.from_pretrained("your-model-id")
70
-
71
- def predict(text):
72
- inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
73
- outputs = model(**inputs)
74
- probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
75
- return "Fake News" if torch.argmax(probs) == 1 else "Real News"
76
-
77
- text = "Breaking: Scientists discover a new element!"
78
- print(predict(text))
79
- ```
80
-
81
- ## Training Details
82
-
83
- ### Training Data
84
-
85
- The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites.
86
-
87
- ### Training Procedure
88
-
89
- - **Preprocessing:**
90
- - Tokenization using `DistilBertTokenizerFast`
91
- - Removal of stop words and punctuation
92
- - Converting text to lowercase
93
-
94
- - **Training Configuration:**
95
- - **Model:** `distilbert-base-uncased`
96
- - **Optimizer:** AdamW
97
- - **Batch size:** 16
98
- - **Epochs:** 3
99
- - **Learning rate:** 2e-5
100
-
101
- ### Compute Resources
102
-
103
- - **Hardware:** NVIDIA Tesla T4 (Google Colab)
104
- - **Training Time:** ~2 hours
105
-
106
- ## Evaluation
107
-
108
- ### Testing Data
109
-
110
- - The model was evaluated on a held-out test set of **10,000 news articles**.
111
-
112
- ### Metrics
113
-
114
- - **Accuracy:** 92%
115
- - **F1 Score:** 90%
116
- - **Precision:** 91%
117
- - **Recall:** 89%
118
-
119
- ### Results
120
-
121
- | Metric | Score |
122
- |----------|-------|
123
- | Accuracy | 92% |
124
- | F1 Score | 90% |
125
- | Precision | 91% |
126
- | Recall | 89% |
127
-
128
- ## Environmental Impact
129
-
130
- - **Hardware Used:** NVIDIA Tesla T4
131
- - **Total Compute Time:** ~2 hours
132
- - **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute)
133
-
134
- ## Technical Specifications
135
-
136
- ### Model Architecture
137
-
138
- - The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy.
139
-
140
- ### Dependencies
141
-
142
- - `transformers`
143
- - `torch`
144
- - `datasets`
145
- - `scikit-learn`
146
-
147
- ## Citation
148
-
149
- If you use this model, please cite it as:
150
-
151
- ```bibtex
152
- @misc{DhruvPal2025FakeNewsDetection,
153
- title={Fake News Detection with DistilBERT},
154
- author={Dhruv Pal},
155
- year={2025},
156
- howpublished={\url{https://huggingface.co/your-model-id}}
157
- }
158
- ```
159
-
160
- ## Contact
161
-
162
- For any queries, feel free to reach out:
163
- - **Author:** Dhruv Pal
164
- - **Email:** dhruv416pal@gmail.com
165
- - **GitHub:** [dhruvpal05](https://github.com/dhruvpal05)
166
- - **LinkedIn:** [idhruvpal](https://linkedin.com/in/idhruvpal)
167
-
 
1
+ ---
2
+ library_name: transformers
3
+ tags: [fake-news-detection, NLP, classification, transformers, DistilBERT]
4
+ ---
5
+
6
+ # Model Card for Fake News Detection Model
7
+
8
+ ## Model Summary
9
+
10
+ This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ - **Finetuned from:** `distilbert-base-uncased`
17
+ - **Language:** English
18
+ - **Model type:** Transformer-based text classification model
19
+ - **License:** MIT
20
+ - **Intended Use:** Fake news detection on social media and news websites
21
+
22
+ ### Model Sources
23
+
24
+ - **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id)
25
+ - **Paper (if applicable):** N/A
26
+ - **Demo (if applicable):** N/A
27
+
28
+ ## Uses
29
+
30
+ ### Direct Use
31
+
32
+ - This model can be used to detect whether a given news article is **real or fake**.
33
+ - It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.
34
+
35
+ ### Downstream Use
36
+
37
+ - Can be further fine-tuned on domain-specific fake news datasets.
38
+ - Useful for media companies, journalists, and researchers studying misinformation.
39
+
40
+ ### Out-of-Scope Use
41
+
42
+ - This model is **not designed for generating news content**.
43
+ - It may not work well for languages other than English.
44
+ - Not suitable for fact-checking complex claims requiring external knowledge.
45
+
46
+ ## Bias, Risks, and Limitations
47
+
48
+ ### Risks
49
+
50
+ - The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
51
+ - There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**.
52
+ - Model performance can degrade on out-of-distribution samples.
53
+
54
+ ### Recommendations
55
+
56
+ - Users should **not rely solely** on this model for determining truthfulness.
57
+ - It is recommended to **use human verification** and **cross-check information** from multiple sources.
58
+
59
+ ## How to Use the Model
60
+
61
+ You can load the model using `transformers` and use it for inference as shown below:
62
+
63
+ ```python
64
+ from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
65
+ import torch
66
+
67
+ tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
68
+ model = DistilBertForSequenceClassification.from_pretrained("your-model-id")
69
+
70
+ def predict(text):
71
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
72
+ outputs = model(**inputs)
73
+ probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
74
+ return "Fake News" if torch.argmax(probs) == 1 else "Real News"
75
+
76
+ text = "Breaking: Scientists discover a new element!"
77
+ print(predict(text))
78
+ ```
79
+
80
+ ## Training Details
81
+
82
+ ### Training Data
83
+
84
+ The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites.
85
+
86
+ ### Training Procedure
87
+
88
+ - **Preprocessing:**
89
+ - Tokenization using `DistilBertTokenizerFast`
90
+ - Removal of stop words and punctuation
91
+ - Converting text to lowercase
92
+
93
+ - **Training Configuration:**
94
+ - **Model:** `distilbert-base-uncased`
95
+ - **Optimizer:** AdamW
96
+ - **Batch size:** 16
97
+ - **Epochs:** 3
98
+ - **Learning rate:** 2e-5
99
+
100
+ ### Compute Resources
101
+
102
+ - **Hardware:** NVIDIA Tesla T4 (Google Colab)
103
+ - **Training Time:** ~2 hours
104
+
105
+ ## Evaluation
106
+
107
+ ### Testing Data
108
+
109
+ - The model was evaluated on a held-out test set of **10,000 news articles**.
110
+
111
+ ### Metrics
112
+
113
+ - **Accuracy:** 92%
114
+ - **F1 Score:** 90%
115
+ - **Precision:** 91%
116
+ - **Recall:** 89%
117
+
118
+ ### Results
119
+
120
+ | Metric | Score |
121
+ |----------|-------|
122
+ | Accuracy | 92% |
123
+ | F1 Score | 90% |
124
+ | Precision | 91% |
125
+ | Recall | 89% |
126
+
127
+ ## Environmental Impact
128
+
129
+ - **Hardware Used:** NVIDIA Tesla T4
130
+ - **Total Compute Time:** ~2 hours
131
+ - **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute)
132
+
133
+ ## Technical Specifications
134
+
135
+ ### Model Architecture
136
+
137
+ - The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy.
138
+
139
+ ### Dependencies
140
+
141
+ - `transformers`
142
+ - `torch`
143
+ - `datasets`
144
+ - `scikit-learn`