Spaces:

DevPatel0611
/

TruthLens

Sleeping

App Files Files Community

TruthLens / README.md

DevPatel0611

Pin python version to 3.10.13 to ensure pre-compiled wheels exist for old dependencies

96a86ec 29 days ago

preview code

raw

history blame contribute delete

4.77 kB

	---
	title: TruthLens
	emoji: 🔍
	colorFrom: blue
	colorTo: indigo
	sdk: streamlit
	sdk_version: 1.31.0
	python_version: 3.10.13
	app_file: app.py
	pinned: false
	---
	# TruthLens: Advanced Fake News Detection Pipeline

	TruthLens is an end-to-end fake news detection system that moves beyond simple machine learning probabilities. It employs a robust 5-signal weighted scoring framework built on journalistic standards, combining deep learning models (DistilBERT, RoBERTa), sequence models (LSTM), statistical models (Logistic Regression), and heuristic analysis to deliver explainable verdicts.

	## 🌟 Key Features

	* 5-Signal Scoring Framework:
	* Source Credibility (30%): Evaluates outlet reputation, author presence, and source corroboration, including typosquatting checks.
	* Claim Verification (30%): Combines AI probability with spaCy-based Named Entity Recognition (NER) and quote attribution analysis.
	* Linguistic Quality (20%): Detects sensationalism, superlatives, passive voice, and uses DistilBERT to check if the headline contradicts the body.
	* Freshness (10%): Contextual and date-based temporal scoring to detect outdated information.
	* AI Model Consensus (10%): Ensemble voting from Logistic Regression, LSTM, DistilBERT, and RoBERTa.
	* Adversarial Guardrails: Hard caps and overrides for highly suspicious patterns (Triple Anonymity, Uncited Statistics, Headline Contradictions).
	* Live Web Corroboration: RAG (Retrieval-Augmented Generation) pipeline using live search to verify unambiguous claims.
	* TruthLens UI: A sleek, dark/light mode adaptable Streamlit dashboard providing detailed explainability down to the specific signals and deductions.

	---

	## 📁 Project Structure

	```text
	fake_news_detection/
	├── app.py # Streamlit frontend (TruthLens UI)
	├── run_pipeline.py # Main script to run pipeline stages
	├── requirements.txt # Python dependencies
	├── src/
	│ ├── stage1_ingestion.py # Downloads and prepares datasets
	│ ├── stage2_preprocessing.py# Cleans text, tokenizes, and saves artifacts
	│ ├── stage3_training.py # Trains models (LR, LSTM, DistilBERT, RoBERTa)
	│ ├── stage4_inference.py # The 5-signal scoring engine and prediction logic
	│ └── utils/
	│ └── rag_retrieval.py # Live web search corroboration functions
	├── data/ # Raw and processed datasets (created during execution)
	└── models/ # Trained models and vectorizers (created during execution)
	```

	---

	## 🚀 Getting Started

	### 1. Installation

	Ensure you have Python 3.8+ installed. Install the required dependencies:

	```bash
	pip install -r requirements.txt
	python -m spacy download en_core_web_sm
	```

	### 2. Running the Pipeline

	The project is divided into stages. You can run the entire pipeline end-to-end, or run specific stages individually using `run_pipeline.py`.

	To run the complete training pipeline (Stages 1 to 3):
	Note: This will download datasets, preprocess them, and train all models. It may take a significant amount of time depending on your hardware.

	```bash
	python run_pipeline.py --stage 1 2 3
	```

	To run individual stages:

	* Stage 1: Data Ingestion
	Downloads and formats the necessary datasets (e.g., LIAR, ISOT).
	```bash
	python run_pipeline.py --stage 1
	```

	* Stage 2: Preprocessing
	Cleans the text, maps verdicts to binary labels, and prepares DataFrames for training.
	```bash
	python run_pipeline.py --stage 2
	```

	* Stage 3: Training
	Trains the ensemble: Logistic Regression, LSTM, DistilBERT, and RoBERTa. Saves the models to the `/models` directory.
	```bash
	python run_pipeline.py --stage 3
	```

	* Stage 4: Evaluation
	Evaluates the trained pipeline on the holdout test set using the 5-signal inference framework.
	```bash
	python run_pipeline.py --eval
	```

	---

	## 🖥️ Running the Application

	Once the models are trained (or if you already have the pre-trained weights in the `/models` directory), you can launch the TruthLens UI.

	```bash
	python -m streamlit run app.py
	```

	This will start a local web server (usually at `http://localhost:8501`).

	### Using the App:
	1. Paste text or provide a URL: You can paste the raw text of an article (with or without a headline) or simply provide a URL for the app to parse automatically.
	2. Select depth: Choose Quick, Standard, or Deep analysis.
	3. View Results: Explore the four-tier verdict (True, Uncertain, Likely False, False), signal breakdown, adversarial flags, and live web corroboration results.