File size: 5,352 Bytes
df91fab
 
 
 
 
 
 
 
 
 
 
769dd6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: Fake News Detection
emoji: πŸ“°
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: "1.31.1"
app_file: app.py
pinned: false
---

# Hybrid Fake News Detection Model

A hybrid deep learning model for fake news detection using BERT and BiLSTM with attention mechanism. This project was developed as part of the Data Mining Laboratory course under the guidance of Dr. Kirti Kumari.

## Project Overview

This project implements a state-of-the-art fake news detection system that combines the power of BERT (Bidirectional Encoder Representations from Transformers) with BiLSTM (Bidirectional Long Short-Term Memory) and attention mechanisms. The model is designed to effectively identify fake news articles by analyzing their textual content and linguistic patterns.

## Data and Model Files

The project uses the following datasets and model files:

### Datasets
- Raw and processed datasets are available at: [Data Files](https://drive.google.com/drive/folders/1uFtWVEjqupSGV7_6sYAxPG52Je1MAigh?usp=sharing)
  - Contains both raw and processed versions of the datasets
  - Includes LIAR and Kaggle Fake News datasets
  - Preprocessed versions ready for training

### Model Files
- Trained model checkpoints are available at: [Model Files](https://drive.google.com/drive/folders/1d1EXjLlYof56yEa9F6qFDPKqO359vnRw?usp=sharing)
  - Contains saved model weights
  - Includes best model checkpoints
  - Model evaluation results

## Project Structure

```
.
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/           # Raw datasets
β”‚   └── processed/     # Processed data
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ saved/        # Saved model checkpoints
β”‚   └── checkpoints/  # Training checkpoints
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/       # Configuration files
β”‚   β”œβ”€β”€ data/         # Data processing modules
β”‚   β”œβ”€β”€ models/       # Model architecture
β”‚   β”œβ”€β”€ utils/        # Utility functions
β”‚   └── visualization/# Visualization modules
β”œβ”€β”€ tests/            # Unit tests
β”œβ”€β”€ notebooks/        # Jupyter notebooks
└── visualizations/   # Generated plots and graphs
```

## Features

- Hybrid architecture combining BERT and BiLSTM
- Attention mechanism for better interpretability
- Comprehensive text preprocessing pipeline
- Support for multiple feature extraction methods
- Early stopping and model checkpointing
- Detailed evaluation metrics
- Interactive visualizations of model performance
- Support for multiple datasets (LIAR, Kaggle Fake News)

## Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/fake-news-detection.git
cd fake-news-detection
```

2. Create a virtual environment:
```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

## Usage

1. Download the required files:
   - Download datasets from the [Data Files](https://drive.google.com/drive/folders/1uFtWVEjqupSGV7_6sYAxPG52Je1MAigh?usp=sharing) link
   - Download pre-trained models from the [Model Files](https://drive.google.com/drive/folders/1d1EXjLlYof56yEa9F6qFDPKqO359vnRw?usp=sharing) link
   - Place the files in their respective directories as shown in the project structure

2. Prepare your dataset:
   - Place your dataset in the `data/raw` directory
   - The dataset should have at least two columns: 'text' and 'label'
   - Supported formats: CSV, TSV

3. Train the model:
```bash
python src/train.py
```

4. Model evaluation metrics and visualizations will be generated in the `visualizations` directory

## Model Architecture

The model combines:
- BERT for contextual embeddings
- BiLSTM for sequence modeling
- Attention mechanism for focusing on important parts
- Classification head for final prediction

### Key Components:
- **BERT Layer**: Extracts contextual word embeddings
- **BiLSTM Layer**: Captures sequential patterns
- **Attention Layer**: Identifies important text segments
- **Classification Head**: Makes final prediction

## Configuration

Key parameters can be modified in `src/config/config.py`:
- Model hyperparameters
- Training parameters
- Data processing settings
- Feature extraction options

## Performance Metrics

The model is evaluated using:
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix

## Future Improvements

- [ ] Add support for image/video metadata
- [ ] Implement real-time detection
- [ ] Add social graph analysis
- [ ] Improve model interpretability
- [ ] Add API endpoints for inference
- [ ] Support for multilingual fake news detection
- [ ] Integration with fact-checking databases

## Acknowledgments

I would like to express our sincere gratitude to **Dr. Kirti Kumari** for her invaluable guidance and support throughout the development of this project. Her expertise in data mining and machine learning has been instrumental in shaping this work.

Special thanks to:
- Open-source community for their excellent tools and libraries
- Dataset providers (LIAR, Kaggle)

## Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Contact

For any queries or suggestions, please feel free to reach out to me.