Navy0067's picture
Update README.md
f0aeba0 verified
metadata
language: en
license: apache-2.0
base_model: microsoft/deberta-v3-large
tags:
  - logical-fallacy-detection
  - deberta-v3-large
  - text-classification
  - argumentation
  - contrastive-learning
  - adversarial-training
  - robust-classification
datasets:
  - logic
  - cocoLoFa
  - Navy0067/contrastive-pairs-for-logical-fallacy
metrics:
  - f1
  - accuracy
model-index:
  - name: fallacy-detector-binary
    results:
      - task:
          type: text-classification
          name: Logical Fallacy Detection
        metrics:
          - type: f1
            value: 0.908
            name: F1 Score
          - type: accuracy
            value: 0.911
            name: Accuracy

Logical Fallacy Detector (Binary)

A binary classifier distinguishing valid reasoning from fallacious arguments, trained with contrastive adversarial examples to handle subtle boundary cases.

Key Innovation: Contrastive learning with 703 adversarial argument pairs where similar wording masks critical reasoning differences.

96% accuracy on diverse real-world test cases | Handles edge cases| 91% F1 |


✨ Capabilities

Detects Common Fallacies

  • βœ… Ad Hominem (attacking person, not argument)
  • βœ… Slippery Slope (exaggerated chain reactions)
  • βœ… False Dilemma (only two options presented)
  • βœ… Appeal to Authority (irrelevant credentials)
  • βœ… Hasty Generalization (insufficient evidence)
  • βœ… Post Hoc Ergo Propter Hoc (correlation β‰  causation)
  • βœ… Circular Reasoning (begging the question)
  • βœ… Straw Man arguments

Validates Logical Reasoning

  • βœ… Formal syllogisms ("All A are B, X is A, therefore X is B")
  • βœ… Mathematical proofs (deductive reasoning, arithmetic)
  • βœ… Scientific explanations (gravity, photosynthesis, chemistry)
  • βœ… Legal arguments (precedent, policy application)
  • βœ… Conditional statements (if-then logic)

Edge Case Handling

  • βœ… Distinguishes relevant vs irrelevant credential attacks
    • Valid: "Color-blind witness can't testify about color"
    • Fallacy: "Witness shoplifted as a kid, so can't testify about color"
  • βœ… True dichotomies vs false dilemmas
    • Valid: "The alarm is either armed or disarmed"
    • Fallacy: "Either ban all cars or accept pollution forever"
  • βœ… Valid authority citations vs fallacious appeals
    • Valid: "Structural engineers agree based on data"
    • Fallacy: "Pop star wore these shoes, so they're best"
  • βœ… Causal relationships vs correlation
    • Valid: "Recalibrating machines increased output"
    • Fallacy: "Playing Mozart increased output"

Limitations

  • ⚠️ Very short statements (<10 words) may be misclassified as fallacies
    • Example: "I like pizza" incorrectly flagged (not an argument)
  • ⚠️ Circular reasoning occasionally missed (e.g., "healing essences promote healing")
  • ⚠️ Context-dependent arguments may need human review
  • ⚠️ Domain-specific jargon may affect accuracy

Model Description

Fine-tuned DeBERTa-v3-large for binary classification using contrastive learning.

Training Data

Total training examples: 6,529

  • 5,335 examples from LOGIC and CoCoLoFa datasets
  • 1,194 contrastive pairs (oversampled 3x = 3,582 effective examples)

Contrastive learning approach: High-quality argument pairs where one is valid and one contains a fallacy. The pairs differ only in reasoning quality, teaching the model to distinguish subtle boundaries.

Test set: 1,130 examples (918 original + 212 contrastive pairs oversampled 2x)


Performance

Validation Metrics (1,130 examples)

Metric Score
F1 Score 90.8%
Accuracy 91.1%
Precision 92.1%
Recall 89.6%
Specificity 92.5%

Error Analysis:

  • False Positive Rate: 7.5% (valid arguments incorrectly flagged)
  • False Negative Rate: 10.4% (fallacies missed)

Confusion Matrix:

  • True Negatives: 529 βœ“ (Valid β†’ Valid)
  • False Positives: 43 βœ— (Valid β†’ Fallacy)
  • False Negatives: 58 βœ— (Fallacy β†’ Valid)
  • True Positives: 500 βœ“ (Fallacy β†’ Fallacy)

Real-World Testing (55 diverse manual cases)

Accuracy: ~96% (53/55 correct)

Perfect performance on:

  • Formal syllogisms and deductive logic
  • Mathematical/arithmetic statements
  • Scientific principles (conservation of mass, photosynthesis, aerodynamics)
  • Legal reasoning (contract terms, building codes, citizenship)
  • Policy arguments with evidence

Correctly identifies edge cases:

  • βœ… Color-blind witness (relevant) vs. shoplifted-as-kid witness (irrelevant)
  • βœ… Structural engineers on bridges (valid authority) vs. physicist on supplements (opinion)
  • βœ… Supply-demand economics (valid principle) vs. Mozart improving machines (false cause)
  • βœ… Large sample generalization vs. anecdotal evidence

Known errors (2/55):

  • ❌ "I like pizza" β†’ Flagged as fallacy (not an argument)
  • ❌ "Natural essences promote healing" β†’ Classified as valid (circular reasoning)

Usage

from transformers import pipeline

# Load model
classifier = pipeline(
    "text-classification",
    model="Navy0067/Fallacy-detector-binary"
)

# Example 1: Valid reasoning (formal logic)
text1 = "All mammals have backbones. Whales are mammals. Therefore whales have backbones."
result = classifier(text1)
# Output: {'label': 'LABEL_0', 'score': 1.00}  # LABEL_0 = Valid

# Example 2: Fallacy (ad hominem)
text2 = "His economic proposal is wrong because he didn't graduate from college."
result = classifier(text2)
# Output: {'label': 'LABEL_1', 'score': 1.00}  # LABEL_1 = Fallacy

# Example 3: Fallacy (slippery slope)
text3 = "If we allow one streetlamp, they'll install them every five feet and destroy our view of the stars."
result = classifier(text3)
# Output: {'label': 'LABEL_1', 'score': 1.00}

# Example 4: Valid (evidence-based)
text4 = "The data shows 95% of patients following physical therapy regained mobility, thus the regimen increases recovery chances."
result = classifier(text4)
# Output: {'label': 'LABEL_0', 'score': 1.00}

# Example 5: Edge case - Relevant credential attack (Valid)
text5 = "The witness's color testimony should be questioned because he was diagnosed with total color blindness."
result = classifier(text5)
# Output: {'label': 'LABEL_0', 'score': 1.00}

# Example 6: Edge case - Irrelevant credential attack (Fallacy)
text6 = "The witness's testimony should be questioned because he shoplifted a candy bar at age twelve."
result = classifier(text6)
# Output: {'label': 'LABEL_1', 'score': 1.00}

Label Mapping:

  • LABEL_0 = Valid reasoning (no fallacy detected)

  • LABEL_1 = Contains fallacy

Training Details

Base Model: microsoft/deberta-v3-large Training Configuration:

Epochs: 6

Batch size: 4 (effective: 16 with gradient accumulation)

Learning rate: 1e-5

Optimizer: AdamW with weight decay 0.01

Scheduler: Cosine with 10% warmup

Max sequence length: 256 tokens

FP16 training enabled

Hardware: Kaggle P100 GPU (~82 minutes training time)

Data Strategy:

Original LOGIC/CoCoLoFa data (81.7% of training set)

Contrastive pairs oversampled 3x (emphasizes boundary learning)

Dataset

The contrastive training pairs used for fine-tuning this model are available at: Navy0067/contrastive-pairs-for-logical-fallacy

Contact

Author: Navyansh Singh

Hugging Face: @Navy0067

Email: Navyansh24102@iiitnr.edu.in

Citation

If you use this model in your research, please cite it as:

@misc{singh2026fallacy,
  author       = {Navyansh Singh},
  title        = {Logical Fallacy Detector: Binary Classification with Contrastive Learning},
  year         = {2026},
  publisher    = {Hugging Face},
  journal      = {Hugging Face Model Hub},
  url          = {https://huggingface.co/Navy0067/Fallacy-detector-binary}
}