malorieiovino commited on
Commit
8d630cf
·
verified ·
1 Parent(s): 9415c9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -77
README.md CHANGED
@@ -10,110 +10,90 @@ app_file: app/app.py
10
  pinned: false
11
  ---
12
 
13
- # TruthLens: AI-Powered Fact-Checking Chatbot
14
 
15
- ## Overview
16
 
17
- TruthLens is a fact-checking chatbot that uses an ensemble of transformer models to evaluate the truthfulness of claims. The system combines multiple fact-checking perspectives by integrating models trained on different frameworks:
18
 
19
- - **DeBERTa**: Fine-tuned on the LIAR dataset (6-class truthfulness scale)
20
- - **DistilBERT**: Fine-tuned on the FEVER dataset (3-class verification system)
21
- - **RoBERTa**: Fine-tuned on the FEVER dataset (3-class verification system)
22
 
23
- By merging these complementary frameworks, TruthLens provides both nuanced truthfulness assessment and decisive verification.
 
 
24
 
25
- ## Features
26
 
27
- - **Cross-Framework Integration**: Combines LIAR's 6-class truthfulness scale with FEVER's 3-class verification system
28
- - **Ensemble Prediction**: Weighted voting system across multiple models for improved accuracy
29
- - **Conversational Interface**: Natural chat interface for easy interaction
30
- - **Explainable Results**: Transparent analysis showing which models contributed to the verdict
31
- - **Bias Detection**: Sensitivity to potential biases in fact-checking
32
- - **Optimized Performance**: Models fine-tuned for efficiency and accuracy
33
 
34
- ## Demo
 
 
 
 
 
 
35
 
36
- TruthLens is deployed and available at: [https://truthlens-gdyc5cjscpwyv74rgkjbou.streamlit.app](https://truthlens-gdyc5cjscpwyv74rgkjbou.streamlit.app)
37
 
38
- ## Technical Details
39
 
40
- ### Model Architecture
41
 
42
- TruthLens employs a weighted ensemble of three transformer models:
 
43
 
44
- 1. **DeBERTa**: Provides nuanced truthfulness classification with 6 classes:
45
- - pants-on-fire
46
- - false
47
- - barely-true
48
- - half-true
49
- - mostly-true
50
- - true
51
 
52
- 2. **DistilBERT & RoBERTa**: Provide decisive verification with 3 classes:
53
- - SUPPORTS
54
- - REFUTES
55
- - NOT ENOUGH INFO
56
 
57
- ### Cross-Framework Mapping
58
 
59
- The system maps between frameworks using carefully designed correspondence:
 
 
 
60
 
61
- | LIAR (6-class) | FEVER (3-class) |
62
- |----------------|-----------------|
63
- | pants-on-fire | REFUTES |
64
- | false | REFUTES |
65
- | barely-true | REFUTES |
66
- | half-true | NOT ENOUGH INFO |
67
- | mostly-true | SUPPORTS |
68
- | true | SUPPORTS |
69
 
70
- ### Optimization Techniques
71
 
72
- The models have been optimized using:
73
- - FP16 precision (half-precision)
74
- - Model pruning (20% sparsity)
75
- - Inference pipeline optimization
 
 
 
 
76
 
77
- ## Development and Training
78
 
79
- ### Datasets
80
 
81
- - **LIAR**: Contains 12.8K human-labeled short statements from PolitiFact, annotated with fine-grained truthfulness ratings
82
- - **FEVER**: Contains 185K claims generated from Wikipedia, manually verified by humans
83
 
84
- ### Training Process
85
 
86
- 1. Initial model development with baseline classifiers
87
- 2. Advanced model development with transformers
88
- 3. Fine-tuning pre-trained models for domain adaptation
89
- 4. Evaluation across multiple metrics
90
- 5. Implementation of explainability techniques
91
- 6. Optimization for deployment
92
 
93
- ## Using TruthLens
94
 
95
- 1. Visit the [TruthLens App](https://truthlens-gdyc5cjscpwyv74rgkjbou.streamlit.app)
96
- 2. Click "Load Models" in the sidebar
97
- 3. Ask the chatbot to fact-check a claim using formats like:
98
- - "Fact-check: [your claim]"
99
- - "Verify: [your claim]"
100
- - "Is it true that [your claim]?"
101
 
102
- ## Future Improvements
103
 
104
- - Integration with live news sources
105
- - Support for multi-lingual fact-checking
106
- - Enhanced bias detection capabilities
107
- - Source citation for verification
108
 
109
  ## License
110
 
111
- For educational purposes only.
112
-
113
- ## Acknowledgments
114
-
115
- - LIAR dataset from William Yang Wang
116
- - FEVER dataset from the FEVER workshop
117
- - Hugging Face for transformer implementations
118
- - Streamlit for deployment platform
119
-
 
10
  pinned: false
11
  ---
12
 
13
+ # TruthLens: A Fact-Checking Assistant with Linguistic Understanding
14
 
15
+ TruthLens is an advanced NLP-powered fact-checking application that uses transformer-based models fine-tuned on fact-checking datasets to assess the truthfulness of textual claims, with special attention to linguistic nuances and semantic structures.
16
 
17
+ ## Overview
18
 
19
+ This application leverages pre-trained language models to analyze claims and classify them based on their factuality. Three different models are implemented to explore how different architectures handle the linguistic complexities of factual statements:
 
 
20
 
21
+ - **DistilBERT (FEVER dataset)**: A lightweight model trained on the Fact Extraction and VERification dataset
22
+ - **RoBERTa (FEVER dataset)**: A more robust model also trained on the FEVER dataset
23
+ - **DeBERTa (LIAR dataset)**: An advanced model trained on the LIAR political fact-checking dataset
24
 
25
+ ## Linguistic Analysis Capabilities
26
 
27
+ TruthLens goes beyond simple fact verification by examining how different models process complex linguistic phenomena:
 
 
 
 
 
28
 
29
+ - **Negation handling**: Assessing how models interpret "not," "never," and other negative constructions
30
+ - **Modal verbs**: Analyzing treatment of uncertainty markers like "might," "could," and "should"
31
+ - **Epistemic modality**: Examining expressions of certainty, possibility, and probabilistic statements
32
+ - **Conditional constructions**: Evaluating how if-then relationships and hypotheticals are processed
33
+ - **Intensifiers and hedges**: Testing the impact of modifiers like "literally," "very," and "somewhat"
34
+ - **Nested propositional structures**: Measuring comprehension of claims embedded within other claims
35
+ - **Comparative and superlative statements**: Analyzing how relative and absolute comparisons are interpreted
36
 
37
+ ## Technical Implementation
38
 
39
+ ### Models
40
 
41
+ The models were fine-tuned on fact-checking datasets using transformers from Hugging Face:
42
 
43
+ - **FEVER Dataset**: Contains 185,445 claims with SUPPORTS, REFUTES, and NOT ENOUGH INFO labels
44
+ - **LIAR Dataset**: Contains 12,800 political statements with six fine-grained truthfulness labels
45
 
46
+ The models are hosted on Hugging Face and loaded directly into the application at runtime.
 
 
 
 
 
 
47
 
48
+ ### Application
 
 
 
49
 
50
+ The application is built with:
51
 
52
+ - **Streamlit**: For the interactive web interface
53
+ - **PyTorch**: For model inference
54
+ - **Transformers**: For loading and utilizing the fine-tuned models
55
+ - **Hugging Face Spaces**: For deployment and hosting
56
 
57
+ ## Linguistic Phenomena Evaluation
 
 
 
 
 
 
 
58
 
59
+ TruthLens specifically examines model behavior across these linguistic constructions:
60
 
61
+ - **Basic Facts vs. Complex Assertions**: Comparing performance on simple statements versus compound or complex sentences
62
+ - **Negation Scope**: Assessing whether models understand the scope of negation within sentences
63
+ - **Modal Semantics**: Evaluating if models distinguish between epistemic possibility, permission, and obligation
64
+ - **Ambiguity Resolution**: Testing how models handle lexical and structural ambiguities
65
+ - **Hedged Claims**: Analyzing recognition of uncertainty markers and their effect on truthfulness assessment
66
+ - **Presuppositions**: Examining how models handle implicit assumptions within claims
67
+ - **Figurative Language**: Testing literal versus non-literal interpretation of metaphorical statements
68
+ - **Subjective vs. Objective Claims**: Measuring distinction between verifiable facts and expressions of opinion
69
 
70
+ ## NLP Coursework Project
71
 
72
+ This project was developed as part of an NLP coursework assessment, focusing on the application of computational linguistics and transformer-based language models to fact verification. It demonstrates the intersection of natural language processing, computational semantics, and information verification systems.
73
 
74
+ The research specifically explores how different transformer architectures handle linguistic nuances that humans naturally process but remain challenging for AI systems, providing insights into both the capabilities and limitations of current NLP approaches to automated fact-checking.
 
75
 
76
+ ## Usage
77
 
78
+ 1. Select a fact-checking model from the sidebar
79
+ 2. Enter a claim or select an example
80
+ 3. Click "Check Fact" to analyze the claim
81
+ 4. Review the prediction and confidence scores
82
+ 5. Examine the detailed linguistic analysis breakdown
 
83
 
84
+ ## Future Linguistic Research Directions
85
 
86
+ - Development of models with enhanced pragmatic understanding
87
+ - Integration of discourse analysis for contextual claim verification
88
+ - Cross-linguistic adaptation for fact-checking in multiple languages
89
+ - Improved recognition of rhetorical devices and their impact on factuality
90
+ - Semantic frame analysis for better understanding of claim structures
91
+ - Temporal reasoning for evolving truths and time-dependent facts
92
 
93
+ ## Author
94
 
95
+ Malorie Iovino
 
 
 
96
 
97
  ## License
98
 
99
+ This project is available for educational and linguistic research purposes.