Spaces:
Running
Running
| title: TrueCheck - Fake News Detection | |
| emoji: π° | |
| colorFrom: red | |
| colorTo: blue | |
| sdk: streamlit | |
| sdk_version: 1.28.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # TruthCheck: Fake News Detection with Fine-Tuned BERT | |
| TruthCheck is an advanced fake news detection system leveraging a hybrid deep learning architecture. It combines a pre-trained BERT-base-uncased model with a BiLSTM and attention mechanism, fully fine-tuned on a curated dataset of real and fake news. The project includes robust preprocessing, feature extraction, model training, evaluation, and a Streamlit web app for interactive predictions. | |
| --- | |
| ## π Features | |
| - **Hybrid Model:** BERT-base-uncased + BiLSTM + Attention | |
| - **Full Fine-Tuning:** All layers of BERT and additional layers are trainable and optimized on the fake news dataset | |
| - **Comprehensive Preprocessing:** Cleaning, tokenization, lemmatization, and more | |
| - **Training & Evaluation:** Scripts for training, validation, and test evaluation | |
| - **Interactive App:** Streamlit web app for real-time news classification | |
| - **Ready for Deployment:** Easily extendable for research or production | |
| --- | |
| ## π§ Model Details | |
| - **Base Model:** [BERT-base-uncased](https://huggingface.co/bert-base-uncased) | |
| - **Architecture:** | |
| - BERT encoder (pre-trained, all layers fine-tuned) | |
| - BiLSTM layer for sequential context | |
| - Attention mechanism for interpretability | |
| - Fully connected classification head | |
| - **Fine-Tuning Technique:** | |
| - All BERT layers are unfrozen and updated during training (full fine-tuning) | |
| - Additional layers (BiLSTM, attention, classifier) are trained from scratch | |
| --- | |
| ## π₯ Download Data and Model | |
| **Raw and Processed Datasets:** | |
| [Google Drive Link](https://drive.google.com/drive/folders/1tAhWhhhDes5uCdcnMLmJdFBSGWFFl55M?usp=sharing) | |
| **Trained Model(s):** | |
| [Google Drive Link](https://drive.google.com/drive/folders/1VEFa0y_vW6AzT5x0fRwmX8shoBhUGd7K?usp=sharing) | |
| ### **Instructions:** | |
| 1. Download the datasets and place them in the `data/` directory: | |
| - `data/raw/` for raw files | |
| - `data/processed/` for processed files | |
| 2. Download the trained model (e.g., `final_model.pt` or `best_model.pt`) and place it in `models/saved/`. | |
| --- | |
| ## βοΈ Setup | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone https://github.com/adnaan-tariq/fake-news-detection.git | |
| cd fake-news-detection | |
| ``` | |
| 2. **Create and activate a virtual environment:** | |
| ```bash | |
| python -m venv venv | |
| .\venv\Scripts\activate | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install --upgrade pip | |
| pip install -r requirements.txt | |
| ``` | |
| --- | |
| ## πββοΈ Usage | |
| ### **Train the Model** | |
| If you want to train from scratch (after placing the data as described above): | |
| ```bash | |
| python -m src.train | |
| ``` | |
| ### **Run the Streamlit App** | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| - Open [http://localhost:8501](http://localhost:8501) in your browser. | |
| ### **Test the Model** | |
| - The app and scripts will use the model in `models/saved/final_model.pt` by default. | |
| - For custom inference, see the example in `src/app.py` or ask for a sample script. | |
| --- | |
| ## π Results | |
| - **Validation Accuracy:** ~93% | |
| - **Validation F1 Score:** ~0.93 | |
| - (See training logs and visualizations for more details.) | |
| --- | |
| ## π¦ Data & Model Policy | |
| - **Data and model files are NOT included in this repository.** | |
| - Please download them from the provided Google Drive links above. | |
| ## π€ Contributing | |
| Pull requests and suggestions are welcome! For major changes, please open an issue first to discuss what you would like to change. | |
| --- | |
| ## π License | |
| This project is licensed under the MIT License. | |
| --- | |