Spaces:

DevPatel0611
/

TruthLens

Sleeping

App Files Files Community

TruthLens / README.md

DevPatel0611

Pin python version to 3.10.13 to ensure pre-compiled wheels exist for old dependencies

96a86ec 28 days ago

preview code

raw

history blame contribute delete

4.77 kB

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

metadata

title: TruthLens
emoji: 🔍
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.31.0
python_version: 3.10.13
app_file: app.py
pinned: false

TruthLens: Advanced Fake News Detection Pipeline

TruthLens is an end-to-end fake news detection system that moves beyond simple machine learning probabilities. It employs a robust 5-signal weighted scoring framework built on journalistic standards, combining deep learning models (DistilBERT, RoBERTa), sequence models (LSTM), statistical models (Logistic Regression), and heuristic analysis to deliver explainable verdicts.

🌟 Key Features

5-Signal Scoring Framework:
- Source Credibility (30%): Evaluates outlet reputation, author presence, and source corroboration, including typosquatting checks.
- Claim Verification (30%): Combines AI probability with spaCy-based Named Entity Recognition (NER) and quote attribution analysis.
- Linguistic Quality (20%): Detects sensationalism, superlatives, passive voice, and uses DistilBERT to check if the headline contradicts the body.
- Freshness (10%): Contextual and date-based temporal scoring to detect outdated information.
- AI Model Consensus (10%): Ensemble voting from Logistic Regression, LSTM, DistilBERT, and RoBERTa.
Adversarial Guardrails: Hard caps and overrides for highly suspicious patterns (Triple Anonymity, Uncited Statistics, Headline Contradictions).
Live Web Corroboration: RAG (Retrieval-Augmented Generation) pipeline using live search to verify unambiguous claims.
TruthLens UI: A sleek, dark/light mode adaptable Streamlit dashboard providing detailed explainability down to the specific signals and deductions.

📁 Project Structure

fake_news_detection/
├── app.py                     # Streamlit frontend (TruthLens UI)
├── run_pipeline.py            # Main script to run pipeline stages
├── requirements.txt           # Python dependencies
├── src/
│   ├── stage1_ingestion.py    # Downloads and prepares datasets
│   ├── stage2_preprocessing.py# Cleans text, tokenizes, and saves artifacts
│   ├── stage3_training.py     # Trains models (LR, LSTM, DistilBERT, RoBERTa)
│   ├── stage4_inference.py    # The 5-signal scoring engine and prediction logic
│   └── utils/
│       └── rag_retrieval.py   # Live web search corroboration functions
├── data/                      # Raw and processed datasets (created during execution)
└── models/                    # Trained models and vectorizers (created during execution)

🚀 Getting Started

1. Installation

Ensure you have Python 3.8+ installed. Install the required dependencies:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

2. Running the Pipeline

The project is divided into stages. You can run the entire pipeline end-to-end, or run specific stages individually using run_pipeline.py.

To run the complete training pipeline (Stages 1 to 3): Note: This will download datasets, preprocess them, and train all models. It may take a significant amount of time depending on your hardware.

python run_pipeline.py --stage 1 2 3

To run individual stages:

Stage 1: Data Ingestion Downloads and formats the necessary datasets (e.g., LIAR, ISOT).
```
python run_pipeline.py --stage 1
```
Stage 2: Preprocessing Cleans the text, maps verdicts to binary labels, and prepares DataFrames for training.
```
python run_pipeline.py --stage 2
```
Stage 3: Training Trains the ensemble: Logistic Regression, LSTM, DistilBERT, and RoBERTa. Saves the models to the /models directory.
```
python run_pipeline.py --stage 3
```
Stage 4: Evaluation Evaluates the trained pipeline on the holdout test set using the 5-signal inference framework.
```
python run_pipeline.py --eval
```

🖥️ Running the Application

Once the models are trained (or if you already have the pre-trained weights in the /models directory), you can launch the TruthLens UI.

python -m streamlit run app.py

This will start a local web server (usually at http://localhost:8501).

Using the App:

Paste text or provide a URL: You can paste the raw text of an article (with or without a headline) or simply provide a URL for the app to parse automatically.
Select depth: Choose Quick, Standard, or Deep analysis.
View Results: Explore the four-tier verdict (True, Uncertain, Likely False, False), signal breakdown, adversarial flags, and live web corroboration results.