Spaces:

malorieiovino
/

TruthLens

Sleeping

App Files Files Community

malorieiovino commited on Apr 3, 2025

Commit

8d630cf

verified ·

1 Parent(s): 9415c9d

Update README.md

Browse files

Files changed (1) hide show

README.md +57 -77

README.md CHANGED Viewed

@@ -10,110 +10,90 @@ app_file: app/app.py
 pinned: false
 ---
-# TruthLens: AI-Powered Fact-Checking Chatbot
-## Overview
-TruthLens is a fact-checking chatbot that uses an ensemble of transformer models to evaluate the truthfulness of claims. The system combines multiple fact-checking perspectives by integrating models trained on different frameworks:
-- **DeBERTa**: Fine-tuned on the LIAR dataset (6-class truthfulness scale)
-- **DistilBERT**: Fine-tuned on the FEVER dataset (3-class verification system)
-- **RoBERTa**: Fine-tuned on the FEVER dataset (3-class verification system)
-By merging these complementary frameworks, TruthLens provides both nuanced truthfulness assessment and decisive verification.
-## Features
-- **Cross-Framework Integration**: Combines LIAR's 6-class truthfulness scale with FEVER's 3-class verification system
-- **Ensemble Prediction**: Weighted voting system across multiple models for improved accuracy
-- **Conversational Interface**: Natural chat interface for easy interaction
-- **Explainable Results**: Transparent analysis showing which models contributed to the verdict
-- **Bias Detection**: Sensitivity to potential biases in fact-checking
-- **Optimized Performance**: Models fine-tuned for efficiency and accuracy
-## Demo
-TruthLens is deployed and available at: [https://truthlens-gdyc5cjscpwyv74rgkjbou.streamlit.app](https://truthlens-gdyc5cjscpwyv74rgkjbou.streamlit.app)
-## Technical Details
-### Model Architecture
-TruthLens employs a weighted ensemble of three transformer models:
-1. **DeBERTa**: Provides nuanced truthfulness classification with 6 classes:
-   - pants-on-fire
-   - false
-   - barely-true
-   - half-true
-   - mostly-true
-   - true
-2. **DistilBERT & RoBERTa**: Provide decisive verification with 3 classes:
-   - SUPPORTS
-   - REFUTES
-   - NOT ENOUGH INFO
-### Cross-Framework Mapping
-The system maps between frameworks using carefully designed correspondence:
-| LIAR (6-class) | FEVER (3-class) |
-|----------------|-----------------|
-| pants-on-fire  | REFUTES         |
-| false          | REFUTES         |
-| barely-true    | REFUTES         |
-| half-true      | NOT ENOUGH INFO |
-| mostly-true    | SUPPORTS        |
-| true           | SUPPORTS        |
-### Optimization Techniques
-The models have been optimized using:
-- FP16 precision (half-precision)
-- Model pruning (20% sparsity)
-- Inference pipeline optimization
-## Development and Training
-### Datasets
-- **LIAR**: Contains 12.8K human-labeled short statements from PolitiFact, annotated with fine-grained truthfulness ratings
-- **FEVER**: Contains 185K claims generated from Wikipedia, manually verified by humans
-### Training Process
-1. Initial model development with baseline classifiers
-2. Advanced model development with transformers
-3. Fine-tuning pre-trained models for domain adaptation
-4. Evaluation across multiple metrics
-5. Implementation of explainability techniques
-6. Optimization for deployment
-## Using TruthLens
-1. Visit the [TruthLens App](https://truthlens-gdyc5cjscpwyv74rgkjbou.streamlit.app)
-2. Click "Load Models" in the sidebar
-3. Ask the chatbot to fact-check a claim using formats like:
-   - "Fact-check: [your claim]"
-   - "Verify: [your claim]"
-   - "Is it true that [your claim]?"
-## Future Improvements
-- Integration with live news sources
-- Support for multi-lingual fact-checking
-- Enhanced bias detection capabilities
-- Source citation for verification
 ## License
-For educational purposes only.
-## Acknowledgments
-- LIAR dataset from William Yang Wang
-- FEVER dataset from the FEVER workshop
-- Hugging Face for transformer implementations
-- Streamlit for deployment platform

 pinned: false
 ---
+# TruthLens: A Fact-Checking Assistant with Linguistic Understanding
+TruthLens is an advanced NLP-powered fact-checking application that uses transformer-based models fine-tuned on fact-checking datasets to assess the truthfulness of textual claims, with special attention to linguistic nuances and semantic structures.
+## Overview
+This application leverages pre-trained language models to analyze claims and classify them based on their factuality. Three different models are implemented to explore how different architectures handle the linguistic complexities of factual statements:
+- **DistilBERT (FEVER dataset)**: A lightweight model trained on the Fact Extraction and VERification dataset
+- **RoBERTa (FEVER dataset)**: A more robust model also trained on the FEVER dataset
+- **DeBERTa (LIAR dataset)**: An advanced model trained on the LIAR political fact-checking dataset
+## Linguistic Analysis Capabilities
+TruthLens goes beyond simple fact verification by examining how different models process complex linguistic phenomena:
+- **Negation handling**: Assessing how models interpret "not," "never," and other negative constructions
+- **Modal verbs**: Analyzing treatment of uncertainty markers like "might," "could," and "should"
+- **Epistemic modality**: Examining expressions of certainty, possibility, and probabilistic statements
+- **Conditional constructions**: Evaluating how if-then relationships and hypotheticals are processed
+- **Intensifiers and hedges**: Testing the impact of modifiers like "literally," "very," and "somewhat"
+- **Nested propositional structures**: Measuring comprehension of claims embedded within other claims
+- **Comparative and superlative statements**: Analyzing how relative and absolute comparisons are interpreted
+## Technical Implementation
+### Models
+The models were fine-tuned on fact-checking datasets using transformers from Hugging Face:
+- **FEVER Dataset**: Contains 185,445 claims with SUPPORTS, REFUTES, and NOT ENOUGH INFO labels
+- **LIAR Dataset**: Contains 12,800 political statements with six fine-grained truthfulness labels
+The models are hosted on Hugging Face and loaded directly into the application at runtime.
+### Application
+The application is built with:
+- **Streamlit**: For the interactive web interface
+- **PyTorch**: For model inference
+- **Transformers**: For loading and utilizing the fine-tuned models
+- **Hugging Face Spaces**: For deployment and hosting
+## Linguistic Phenomena Evaluation
+TruthLens specifically examines model behavior across these linguistic constructions:
+- **Basic Facts vs. Complex Assertions**: Comparing performance on simple statements versus compound or complex sentences
+- **Negation Scope**: Assessing whether models understand the scope of negation within sentences
+- **Modal Semantics**: Evaluating if models distinguish between epistemic possibility, permission, and obligation
+- **Ambiguity Resolution**: Testing how models handle lexical and structural ambiguities
+- **Hedged Claims**: Analyzing recognition of uncertainty markers and their effect on truthfulness assessment
+- **Presuppositions**: Examining how models handle implicit assumptions within claims
+- **Figurative Language**: Testing literal versus non-literal interpretation of metaphorical statements
+- **Subjective vs. Objective Claims**: Measuring distinction between verifiable facts and expressions of opinion
+## NLP Coursework Project
+This project was developed as part of an NLP coursework assessment, focusing on the application of computational linguistics and transformer-based language models to fact verification. It demonstrates the intersection of natural language processing, computational semantics, and information verification systems.
+The research specifically explores how different transformer architectures handle linguistic nuances that humans naturally process but remain challenging for AI systems, providing insights into both the capabilities and limitations of current NLP approaches to automated fact-checking.
+## Usage
+1. Select a fact-checking model from the sidebar
+2. Enter a claim or select an example
+3. Click "Check Fact" to analyze the claim
+4. Review the prediction and confidence scores
+5. Examine the detailed linguistic analysis breakdown
+## Future Linguistic Research Directions
+- Development of models with enhanced pragmatic understanding
+- Integration of discourse analysis for contextual claim verification
+- Cross-linguistic adaptation for fact-checking in multiple languages
+- Improved recognition of rhetorical devices and their impact on factuality
+- Semantic frame analysis for better understanding of claim structures
+- Temporal reasoning for evolving truths and time-dependent facts
+## Author
+Malorie Iovino
 ## License
+This project is available for educational and linguistic research purposes.