MisInfo-ChatBot / README.md

Update README.md

7d78d9d verified 10 months ago

4.27 kB

	---
	library_name: transformers
	tags: [fake-news-detection, NLP, classification, transformers, DistilBERT]
	---

	# Model Card for Fake News Detection Model

	## Model Summary

	This is a fine-tuned DistilBERT model for fake news detection. It classifies news articles as either real or fake based on textual content. The model has been trained on a labeled dataset consisting of true and false news articles collected from various sources.

	## Model Details

	### Model Description

	- Finetuned from: `distilbert-base-uncased`
	- Language: English
	- Model type: Transformer-based text classification model
	- License: MIT
	- Intended Use: Fake news detection on social media and news websites

	### Model Sources

	- Repository: [Hugging Face Model Hub](https://huggingface.co/your-model-id)
	- Paper (if applicable): N/A
	- Demo (if applicable): N/A

	## Uses

	### Direct Use

	- This model can be used to detect whether a given news article is real or fake.
	- It can be integrated into fact-checking platforms, misinformation detection systems, and social media moderation tools.

	### Downstream Use

	- Can be further fine-tuned on domain-specific fake news datasets.
	- Useful for media companies, journalists, and researchers studying misinformation.

	### Out-of-Scope Use

	- This model is not designed for generating news content.
	- It may not work well for languages other than English.
	- Not suitable for fact-checking complex claims requiring external knowledge.

	## Bias, Risks, and Limitations

	### Risks

	- The model may be biased towards certain topics, sources, or writing styles based on the dataset used for training.
	- There is a possibility of false positives (real news misclassified as fake) or false negatives (fake news classified as real).
	- Model performance can degrade on out-of-distribution samples.

	### Recommendations

	- Users should not rely solely on this model for determining truthfulness.
	- It is recommended to use human verification and cross-check information from multiple sources.

	## How to Use the Model

	You can load the model using `transformers` and use it for inference as shown below:

	```python
	from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
	import torch

	tokenizer = DistilBertTokenizerFast.from_pretrained("your-model-id")
	model = DistilBertForSequenceClassification.from_pretrained("your-model-id")

	def predict(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
	outputs = model(**inputs)
	probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
	return "Fake News" if torch.argmax(probs) == 1 else "Real News"

	text = "Breaking: Scientists discover a new element!"
	print(predict(text))
	```

	## Training Details

	### Training Data

	The model was trained on a dataset consisting of news articles labeled as real or fake. The dataset includes information from reputable sources and misinformation websites.

	### Training Procedure

	- Preprocessing:
	- Tokenization using `DistilBertTokenizerFast`
	- Removal of stop words and punctuation
	- Converting text to lowercase

	- Training Configuration:
	- Model: `distilbert-base-uncased`
	- Optimizer: AdamW
	- Batch size: 16
	- Epochs: 3
	- Learning rate: 2e-5

	### Compute Resources

	- Hardware: NVIDIA Tesla T4 (Google Colab)
	- Training Time: ~2 hours

	## Evaluation

	### Testing Data

	- The model was evaluated on a held-out test set of 10,000 news articles.

	### Metrics

	- Accuracy: 92%
	- F1 Score: 90%
	- Precision: 91%
	- Recall: 89%

	### Results

	\| Metric \| Score \|
	\|----------\|-------\|
	\| Accuracy \| 92% \|
	\| F1 Score \| 90% \|
	\| Precision \| 91% \|
	\| Recall \| 89% \|

	## Environmental Impact

	- Hardware Used: NVIDIA Tesla T4
	- Total Compute Time: ~2 hours
	- Carbon Emissions: Estimated using the [ML Impact Calculator](https://mlco2.github.io/impact#compute)

	## Technical Specifications

	### Model Architecture

	- The model is based on DistilBERT, a lightweight transformer architecture that reduces computation while retaining accuracy.

	### Dependencies

	- `transformers`
	- `torch`
	- `datasets`
	- `scikit-learn`