Spaces:
Sleeping
Sleeping
| title: TruthLens | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.31.0 | |
| python_version: 3.10.13 | |
| app_file: app.py | |
| pinned: false | |
| # TruthLens: Advanced Fake News Detection Pipeline | |
| TruthLens is an end-to-end fake news detection system that moves beyond simple machine learning probabilities. It employs a robust **5-signal weighted scoring framework** built on journalistic standards, combining deep learning models (DistilBERT, RoBERTa), sequence models (LSTM), statistical models (Logistic Regression), and heuristic analysis to deliver explainable verdicts. | |
| ## π Key Features | |
| * **5-Signal Scoring Framework:** | |
| * **Source Credibility (30%):** Evaluates outlet reputation, author presence, and source corroboration, including typosquatting checks. | |
| * **Claim Verification (30%):** Combines AI probability with spaCy-based Named Entity Recognition (NER) and quote attribution analysis. | |
| * **Linguistic Quality (20%):** Detects sensationalism, superlatives, passive voice, and uses DistilBERT to check if the headline contradicts the body. | |
| * **Freshness (10%):** Contextual and date-based temporal scoring to detect outdated information. | |
| * **AI Model Consensus (10%):** Ensemble voting from Logistic Regression, LSTM, DistilBERT, and RoBERTa. | |
| * **Adversarial Guardrails:** Hard caps and overrides for highly suspicious patterns (Triple Anonymity, Uncited Statistics, Headline Contradictions). | |
| * **Live Web Corroboration:** RAG (Retrieval-Augmented Generation) pipeline using live search to verify unambiguous claims. | |
| * **TruthLens UI:** A sleek, dark/light mode adaptable Streamlit dashboard providing detailed explainability down to the specific signals and deductions. | |
| --- | |
| ## π Project Structure | |
| ```text | |
| fake_news_detection/ | |
| βββ app.py # Streamlit frontend (TruthLens UI) | |
| βββ run_pipeline.py # Main script to run pipeline stages | |
| βββ requirements.txt # Python dependencies | |
| βββ src/ | |
| β βββ stage1_ingestion.py # Downloads and prepares datasets | |
| β βββ stage2_preprocessing.py# Cleans text, tokenizes, and saves artifacts | |
| β βββ stage3_training.py # Trains models (LR, LSTM, DistilBERT, RoBERTa) | |
| β βββ stage4_inference.py # The 5-signal scoring engine and prediction logic | |
| β βββ utils/ | |
| β βββ rag_retrieval.py # Live web search corroboration functions | |
| βββ data/ # Raw and processed datasets (created during execution) | |
| βββ models/ # Trained models and vectorizers (created during execution) | |
| ``` | |
| --- | |
| ## π Getting Started | |
| ### 1. Installation | |
| Ensure you have Python 3.8+ installed. Install the required dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| python -m spacy download en_core_web_sm | |
| ``` | |
| ### 2. Running the Pipeline | |
| The project is divided into stages. You can run the entire pipeline end-to-end, or run specific stages individually using `run_pipeline.py`. | |
| **To run the complete training pipeline (Stages 1 to 3):** | |
| *Note: This will download datasets, preprocess them, and train all models. It may take a significant amount of time depending on your hardware.* | |
| ```bash | |
| python run_pipeline.py --stage 1 2 3 | |
| ``` | |
| **To run individual stages:** | |
| * **Stage 1: Data Ingestion** | |
| Downloads and formats the necessary datasets (e.g., LIAR, ISOT). | |
| ```bash | |
| python run_pipeline.py --stage 1 | |
| ``` | |
| * **Stage 2: Preprocessing** | |
| Cleans the text, maps verdicts to binary labels, and prepares DataFrames for training. | |
| ```bash | |
| python run_pipeline.py --stage 2 | |
| ``` | |
| * **Stage 3: Training** | |
| Trains the ensemble: Logistic Regression, LSTM, DistilBERT, and RoBERTa. Saves the models to the `/models` directory. | |
| ```bash | |
| python run_pipeline.py --stage 3 | |
| ``` | |
| * **Stage 4: Evaluation** | |
| Evaluates the trained pipeline on the holdout test set using the 5-signal inference framework. | |
| ```bash | |
| python run_pipeline.py --eval | |
| ``` | |
| --- | |
| ## π₯οΈ Running the Application | |
| Once the models are trained (or if you already have the pre-trained weights in the `/models` directory), you can launch the TruthLens UI. | |
| ```bash | |
| python -m streamlit run app.py | |
| ``` | |
| This will start a local web server (usually at `http://localhost:8501`). | |
| ### Using the App: | |
| 1. **Paste text or provide a URL:** You can paste the raw text of an article (with or without a headline) or simply provide a URL for the app to parse automatically. | |
| 2. **Select depth:** Choose Quick, Standard, or Deep analysis. | |
| 3. **View Results:** Explore the four-tier verdict (True, Uncertain, Likely False, False), signal breakdown, adversarial flags, and live web corroboration results. | |