Spaces:
Sleeping
Sleeping
File size: 4,767 Bytes
77431a0 96a86ec 77431a0 86b932c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | ---
title: TruthLens
emoji: π
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.31.0
python_version: 3.10.13
app_file: app.py
pinned: false
---
# TruthLens: Advanced Fake News Detection Pipeline
TruthLens is an end-to-end fake news detection system that moves beyond simple machine learning probabilities. It employs a robust **5-signal weighted scoring framework** built on journalistic standards, combining deep learning models (DistilBERT, RoBERTa), sequence models (LSTM), statistical models (Logistic Regression), and heuristic analysis to deliver explainable verdicts.
## π Key Features
* **5-Signal Scoring Framework:**
* **Source Credibility (30%):** Evaluates outlet reputation, author presence, and source corroboration, including typosquatting checks.
* **Claim Verification (30%):** Combines AI probability with spaCy-based Named Entity Recognition (NER) and quote attribution analysis.
* **Linguistic Quality (20%):** Detects sensationalism, superlatives, passive voice, and uses DistilBERT to check if the headline contradicts the body.
* **Freshness (10%):** Contextual and date-based temporal scoring to detect outdated information.
* **AI Model Consensus (10%):** Ensemble voting from Logistic Regression, LSTM, DistilBERT, and RoBERTa.
* **Adversarial Guardrails:** Hard caps and overrides for highly suspicious patterns (Triple Anonymity, Uncited Statistics, Headline Contradictions).
* **Live Web Corroboration:** RAG (Retrieval-Augmented Generation) pipeline using live search to verify unambiguous claims.
* **TruthLens UI:** A sleek, dark/light mode adaptable Streamlit dashboard providing detailed explainability down to the specific signals and deductions.
---
## π Project Structure
```text
fake_news_detection/
βββ app.py # Streamlit frontend (TruthLens UI)
βββ run_pipeline.py # Main script to run pipeline stages
βββ requirements.txt # Python dependencies
βββ src/
β βββ stage1_ingestion.py # Downloads and prepares datasets
β βββ stage2_preprocessing.py# Cleans text, tokenizes, and saves artifacts
β βββ stage3_training.py # Trains models (LR, LSTM, DistilBERT, RoBERTa)
β βββ stage4_inference.py # The 5-signal scoring engine and prediction logic
β βββ utils/
β βββ rag_retrieval.py # Live web search corroboration functions
βββ data/ # Raw and processed datasets (created during execution)
βββ models/ # Trained models and vectorizers (created during execution)
```
---
## π Getting Started
### 1. Installation
Ensure you have Python 3.8+ installed. Install the required dependencies:
```bash
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```
### 2. Running the Pipeline
The project is divided into stages. You can run the entire pipeline end-to-end, or run specific stages individually using `run_pipeline.py`.
**To run the complete training pipeline (Stages 1 to 3):**
*Note: This will download datasets, preprocess them, and train all models. It may take a significant amount of time depending on your hardware.*
```bash
python run_pipeline.py --stage 1 2 3
```
**To run individual stages:**
* **Stage 1: Data Ingestion**
Downloads and formats the necessary datasets (e.g., LIAR, ISOT).
```bash
python run_pipeline.py --stage 1
```
* **Stage 2: Preprocessing**
Cleans the text, maps verdicts to binary labels, and prepares DataFrames for training.
```bash
python run_pipeline.py --stage 2
```
* **Stage 3: Training**
Trains the ensemble: Logistic Regression, LSTM, DistilBERT, and RoBERTa. Saves the models to the `/models` directory.
```bash
python run_pipeline.py --stage 3
```
* **Stage 4: Evaluation**
Evaluates the trained pipeline on the holdout test set using the 5-signal inference framework.
```bash
python run_pipeline.py --eval
```
---
## π₯οΈ Running the Application
Once the models are trained (or if you already have the pre-trained weights in the `/models` directory), you can launch the TruthLens UI.
```bash
python -m streamlit run app.py
```
This will start a local web server (usually at `http://localhost:8501`).
### Using the App:
1. **Paste text or provide a URL:** You can paste the raw text of an article (with or without a headline) or simply provide a URL for the app to parse automatically.
2. **Select depth:** Choose Quick, Standard, or Deep analysis.
3. **View Results:** Explore the four-tier verdict (True, Uncertain, Likely False, False), signal breakdown, adversarial flags, and live web corroboration results.
|