TruthLens / README.md
malorieiovino's picture
Update README.md
8d630cf verified

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade
metadata
title: TruthLens
emoji: 🔎
colorFrom: blue
colorTo: green
sdk: streamlit
app_file: app/app.py
pinned: false

TruthLens: A Fact-Checking Assistant with Linguistic Understanding

TruthLens is an advanced NLP-powered fact-checking application that uses transformer-based models fine-tuned on fact-checking datasets to assess the truthfulness of textual claims, with special attention to linguistic nuances and semantic structures.

Overview

This application leverages pre-trained language models to analyze claims and classify them based on their factuality. Three different models are implemented to explore how different architectures handle the linguistic complexities of factual statements:

  • DistilBERT (FEVER dataset): A lightweight model trained on the Fact Extraction and VERification dataset
  • RoBERTa (FEVER dataset): A more robust model also trained on the FEVER dataset
  • DeBERTa (LIAR dataset): An advanced model trained on the LIAR political fact-checking dataset

Linguistic Analysis Capabilities

TruthLens goes beyond simple fact verification by examining how different models process complex linguistic phenomena:

  • Negation handling: Assessing how models interpret "not," "never," and other negative constructions
  • Modal verbs: Analyzing treatment of uncertainty markers like "might," "could," and "should"
  • Epistemic modality: Examining expressions of certainty, possibility, and probabilistic statements
  • Conditional constructions: Evaluating how if-then relationships and hypotheticals are processed
  • Intensifiers and hedges: Testing the impact of modifiers like "literally," "very," and "somewhat"
  • Nested propositional structures: Measuring comprehension of claims embedded within other claims
  • Comparative and superlative statements: Analyzing how relative and absolute comparisons are interpreted

Technical Implementation

Models

The models were fine-tuned on fact-checking datasets using transformers from Hugging Face:

  • FEVER Dataset: Contains 185,445 claims with SUPPORTS, REFUTES, and NOT ENOUGH INFO labels
  • LIAR Dataset: Contains 12,800 political statements with six fine-grained truthfulness labels

The models are hosted on Hugging Face and loaded directly into the application at runtime.

Application

The application is built with:

  • Streamlit: For the interactive web interface
  • PyTorch: For model inference
  • Transformers: For loading and utilizing the fine-tuned models
  • Hugging Face Spaces: For deployment and hosting

Linguistic Phenomena Evaluation

TruthLens specifically examines model behavior across these linguistic constructions:

  • Basic Facts vs. Complex Assertions: Comparing performance on simple statements versus compound or complex sentences
  • Negation Scope: Assessing whether models understand the scope of negation within sentences
  • Modal Semantics: Evaluating if models distinguish between epistemic possibility, permission, and obligation
  • Ambiguity Resolution: Testing how models handle lexical and structural ambiguities
  • Hedged Claims: Analyzing recognition of uncertainty markers and their effect on truthfulness assessment
  • Presuppositions: Examining how models handle implicit assumptions within claims
  • Figurative Language: Testing literal versus non-literal interpretation of metaphorical statements
  • Subjective vs. Objective Claims: Measuring distinction between verifiable facts and expressions of opinion

NLP Coursework Project

This project was developed as part of an NLP coursework assessment, focusing on the application of computational linguistics and transformer-based language models to fact verification. It demonstrates the intersection of natural language processing, computational semantics, and information verification systems.

The research specifically explores how different transformer architectures handle linguistic nuances that humans naturally process but remain challenging for AI systems, providing insights into both the capabilities and limitations of current NLP approaches to automated fact-checking.

Usage

  1. Select a fact-checking model from the sidebar
  2. Enter a claim or select an example
  3. Click "Check Fact" to analyze the claim
  4. Review the prediction and confidence scores
  5. Examine the detailed linguistic analysis breakdown

Future Linguistic Research Directions

  • Development of models with enhanced pragmatic understanding
  • Integration of discourse analysis for contextual claim verification
  • Cross-linguistic adaptation for fact-checking in multiple languages
  • Improved recognition of rhetorical devices and their impact on factuality
  • Semantic frame analysis for better understanding of claim structures
  • Temporal reasoning for evolving truths and time-dependent facts

Author

Malorie Iovino

License

This project is available for educational and linguistic research purposes.