File size: 4,269 Bytes

7d78d9d

---
library_name: transformers
tags: [fake-news-detection, NLP, classification, transformers, DistilBERT]
---

# Model Card for Fake News Detection Model

## Model Summary

This is a fine-tuned DistilBERT model for **fake news detection**. It classifies news articles as either **real** or **fake** based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.

## Model Details

### Model Description

- **Finetuned from:** `distilbert-base-uncased`
- **Language:** English
- **Model type:** Transformer-based text classification model
- **License:** MIT
- **Intended Use:** Fake news detection on social media and news websites

### Model Sources

- **Repository:** [Hugging Face Model Hub](https://huggingface.co/your-model-id)
- **Paper (if applicable):** N/A
- **Demo (if applicable):** N/A

## Uses

### Direct Use

- This model can be used to detect whether a given news article is **real or fake**.
- It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.

### Downstream Use

- Can be further fine-tuned on domain-specific fake news datasets.
- Useful for media companies, journalists, and researchers studying misinformation.

### Out-of-Scope Use

- This model is **not designed for generating news content**.
- It may not work well for languages other than English.
- Not suitable for fact-checking complex claims requiring external knowledge.

## Bias, Risks, and Limitations

### Risks

- The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
- There is a possibility of **false positives (real news misclassified as fake)** or **false negatives (fake news classified as real)**.
- Model performance can degrade on out-of-distribution samples.

### Recommendations

- Users should **not rely solely** on this model for determining truthfulness.
- It is recommended to **use human verification** and **cross-check information** from multiple sources.

## How to Use the Model

You can load the model using `transformers` and use it for inference as shown below:

```python
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
import torch

tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
model = DistilBertForSequenceClassification.from_pretrained("your-model-id")

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    return "Fake News" if torch.argmax(probs) == 1 else "Real News"

text = "Breaking: Scientists discover a new element!"
print(predict(text))
```

## Training Details

### Training Data

The model was trained on a dataset consisting of **news articles labeled as real or fake**. The dataset includes information from reputable sources and misinformation websites.

### Training Procedure

- **Preprocessing:**
  - Tokenization using `DistilBertTokenizerFast`
  - Removal of stop words and punctuation
  - Converting text to lowercase

- **Training Configuration:**
  - **Model:** `distilbert-base-uncased`
  - **Optimizer:** AdamW
  - **Batch size:** 16
  - **Epochs:** 3
  - **Learning rate:** 2e-5

### Compute Resources

- **Hardware:** NVIDIA Tesla T4 (Google Colab)
- **Training Time:** ~2 hours

## Evaluation

### Testing Data

- The model was evaluated on a held-out test set of **10,000 news articles**.

### Metrics

- **Accuracy:** 92%
- **F1 Score:** 90%
- **Precision:** 91%
- **Recall:** 89%

### Results

| Metric   | Score |
|----------|-------|
| Accuracy | 92%   |
| F1 Score | 90%   |
| Precision | 91%  |
| Recall   | 89%   |

## Environmental Impact

- **Hardware Used:** NVIDIA Tesla T4
- **Total Compute Time:** ~2 hours
- **Carbon Emissions:** Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute)

## Technical Specifications

### Model Architecture

- The model is based on **DistilBERT**, a lightweight transformer architecture that reduces computation while retaining accuracy.

### Dependencies

- `transformers`
- `torch`
- `datasets`
- `scikit-learn`