| --- |
| language: en |
| license: apache-2.0 |
| base_model: microsoft/deberta-v3-large |
| tags: |
| - logical-fallacy-detection |
| - deberta-v3-large |
| - text-classification |
| - argumentation |
| - contrastive-learning |
| - adversarial-training |
| - robust-classification |
| datasets: |
| - logic |
| - cocoLoFa |
| - Navy0067/contrastive-pairs-for-logical-fallacy |
| metrics: |
| - f1 |
| - accuracy |
| model-index: |
| - name: fallacy-detector-binary |
| results: |
| - task: |
| type: text-classification |
| name: Logical Fallacy Detection |
| metrics: |
| - type: f1 |
| value: 0.908 |
| name: F1 Score |
| - type: accuracy |
| value: 0.911 |
| name: Accuracy |
|
|
| |
| --- |
| # Logical Fallacy Detector (Binary) |
|
|
| A binary classifier distinguishing **valid reasoning** from **fallacious arguments**, trained with contrastive adversarial examples to handle subtle boundary cases. |
|
|
| **Key Innovation:** Contrastive learning with 703 adversarial argument pairs where similar wording masks critical reasoning differences. |
|
|
| **96% accuracy on diverse real-world test cases** | **Handles edge cases**| **91% F1** | |
|
|
| --- |
|
|
| ## β¨ Capabilities |
|
|
| ### Detects Common Fallacies |
| - β
**Ad Hominem** (attacking person, not argument) |
| - β
**Slippery Slope** (exaggerated chain reactions) |
| - β
**False Dilemma** (only two options presented) |
| - β
**Appeal to Authority** (irrelevant credentials) |
| - β
**Hasty Generalization** (insufficient evidence) |
| - β
**Post Hoc Ergo Propter Hoc** (correlation β causation) |
| - β
**Circular Reasoning** (begging the question) |
| - β
**Straw Man** arguments |
|
|
| ### Validates Logical Reasoning |
| - β
**Formal syllogisms** ("All A are B, X is A, therefore X is B") |
| - β
**Mathematical proofs** (deductive reasoning, arithmetic) |
| - β
**Scientific explanations** (gravity, photosynthesis, chemistry) |
| - β
**Legal arguments** (precedent, policy application) |
| - β
**Conditional statements** (if-then logic) |
|
|
| ### Edge Case Handling |
| - β
**Distinguishes relevant vs irrelevant credential attacks** |
| - Valid: "Color-blind witness can't testify about color" |
| - Fallacy: "Witness shoplifted as a kid, so can't testify about color" |
| - β
**True dichotomies vs false dilemmas** |
| - Valid: "The alarm is either armed or disarmed" |
| - Fallacy: "Either ban all cars or accept pollution forever" |
| - β
**Valid authority citations vs fallacious appeals** |
| - Valid: "Structural engineers agree based on data" |
| - Fallacy: "Pop star wore these shoes, so they're best" |
| - β
**Causal relationships vs correlation** |
| - Valid: "Recalibrating machines increased output" |
| - Fallacy: "Playing Mozart increased output" |
|
|
| ### Limitations |
| - β οΈ **Very short statements** (<10 words) may be misclassified as fallacies |
| - Example: "I like pizza" incorrectly flagged (not an argument) |
| - β οΈ **Circular reasoning** occasionally missed (e.g., "healing essences promote healing") |
| - β οΈ **Context-dependent arguments** may need human review |
| - β οΈ **Domain-specific jargon** may affect accuracy |
|
|
| --- |
|
|
| ## Model Description |
|
|
| Fine-tuned **DeBERTa-v3-large** for binary classification using contrastive learning. |
|
|
| ### Training Data |
|
|
| **Total training examples**: 6,529 |
| - 5,335 examples from LOGIC and CoCoLoFa datasets |
| - 1,194 contrastive pairs (oversampled 3x = 3,582 effective examples) |
|
|
| **Contrastive learning approach**: High-quality argument pairs where one is valid and one contains a fallacy. The pairs differ only in reasoning quality, teaching the model to distinguish subtle boundaries. |
|
|
| **Test set**: 1,130 examples (918 original + 212 contrastive pairs oversampled 2x) |
|
|
| --- |
|
|
| ## Performance |
|
|
| ### Validation Metrics (1,130 examples) |
|
|
| | Metric | Score | |
| |--------|-------| |
| | **F1 Score** | 90.8% | |
| | **Accuracy** | 91.1% | |
| | **Precision** | 92.1% | |
| | **Recall** | 89.6% | |
| | **Specificity** | 92.5% | |
|
|
| **Error Analysis:** |
| - False Positive Rate: 7.5% (valid arguments incorrectly flagged) |
| - False Negative Rate: 10.4% (fallacies missed) |
|
|
| **Confusion Matrix:** |
| - True Negatives: 529 β (Valid β Valid) |
| - False Positives: 43 β (Valid β Fallacy) |
| - False Negatives: 58 β (Fallacy β Valid) |
| - True Positives: 500 β (Fallacy β Fallacy) |
|
|
| ### Real-World Testing (55 diverse manual cases) |
|
|
| **Accuracy: ~96%** (53/55 correct) |
|
|
| **Perfect performance on:** |
| - Formal syllogisms and deductive logic |
| - Mathematical/arithmetic statements |
| - Scientific principles (conservation of mass, photosynthesis, aerodynamics) |
| - Legal reasoning (contract terms, building codes, citizenship) |
| - Policy arguments with evidence |
|
|
| **Correctly identifies edge cases:** |
| - β
Color-blind witness (relevant) vs. shoplifted-as-kid witness (irrelevant) |
| - β
Structural engineers on bridges (valid authority) vs. physicist on supplements (opinion) |
| - β
Supply-demand economics (valid principle) vs. Mozart improving machines (false cause) |
| - β
Large sample generalization vs. anecdotal evidence |
|
|
| **Known errors (2/55):** |
| - β "I like pizza" β Flagged as fallacy (not an argument) |
| - β "Natural essences promote healing" β Classified as valid (circular reasoning) |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import pipeline |
| |
| # Load model |
| classifier = pipeline( |
| "text-classification", |
| model="Navy0067/Fallacy-detector-binary" |
| ) |
| |
| # Example 1: Valid reasoning (formal logic) |
| text1 = "All mammals have backbones. Whales are mammals. Therefore whales have backbones." |
| result = classifier(text1) |
| # Output: {'label': 'LABEL_0', 'score': 1.00} # LABEL_0 = Valid |
| |
| # Example 2: Fallacy (ad hominem) |
| text2 = "His economic proposal is wrong because he didn't graduate from college." |
| result = classifier(text2) |
| # Output: {'label': 'LABEL_1', 'score': 1.00} # LABEL_1 = Fallacy |
| |
| # Example 3: Fallacy (slippery slope) |
| text3 = "If we allow one streetlamp, they'll install them every five feet and destroy our view of the stars." |
| result = classifier(text3) |
| # Output: {'label': 'LABEL_1', 'score': 1.00} |
| |
| # Example 4: Valid (evidence-based) |
| text4 = "The data shows 95% of patients following physical therapy regained mobility, thus the regimen increases recovery chances." |
| result = classifier(text4) |
| # Output: {'label': 'LABEL_0', 'score': 1.00} |
| |
| # Example 5: Edge case - Relevant credential attack (Valid) |
| text5 = "The witness's color testimony should be questioned because he was diagnosed with total color blindness." |
| result = classifier(text5) |
| # Output: {'label': 'LABEL_0', 'score': 1.00} |
| |
| # Example 6: Edge case - Irrelevant credential attack (Fallacy) |
| text6 = "The witness's testimony should be questioned because he shoplifted a candy bar at age twelve." |
| result = classifier(text6) |
| # Output: {'label': 'LABEL_1', 'score': 1.00} |
| ```` |
| ---- |
|
|
| ## Label Mapping: |
|
|
| - LABEL_0 = Valid reasoning (no fallacy detected) |
| |
| - LABEL_1 = Contains fallacy |
|
|
| ### Training Details |
| Base Model: microsoft/deberta-v3-large |
| Training Configuration: |
|
|
| Epochs: 6 |
|
|
| Batch size: 4 (effective: 16 with gradient accumulation) |
|
|
| Learning rate: 1e-5 |
|
|
| Optimizer: AdamW with weight decay 0.01 |
|
|
| Scheduler: Cosine with 10% warmup |
|
|
| Max sequence length: 256 tokens |
|
|
| FP16 training enabled |
|
|
| Hardware: Kaggle P100 GPU (~82 minutes training time) |
|
|
| Data Strategy: |
|
|
| Original LOGIC/CoCoLoFa data (81.7% of training set) |
|
|
| Contrastive pairs oversampled 3x (emphasizes boundary learning) |
|
|
| ## Dataset |
|
|
| The contrastive training pairs used for fine-tuning this model are available at: |
| [Navy0067/contrastive-pairs-for-logical-fallacy](https://huggingface.co/datasets/Navy0067/contrastive-pairs-for-logical-fallacy) |
|
|
| ## Contact |
| Author: Navyansh Singh |
|
|
| Hugging Face: @Navy0067 |
|
|
| Email: Navyansh24102@iiitnr.edu.in |
|
|
|
|
| ## Citation |
|
|
| If you use this model in your research, please cite it as: |
|
|
| ```bibtex |
| @misc{singh2026fallacy, |
| author = {Navyansh Singh}, |
| title = {Logical Fallacy Detector: Binary Classification with Contrastive Learning}, |
| year = {2026}, |
| publisher = {Hugging Face}, |
| journal = {Hugging Face Model Hub}, |
| url = {https://huggingface.co/Navy0067/Fallacy-detector-binary} |
| } |
| |