Research Artifact β Not Production-Ready
This model verifies logical syllogism responses using structural binary features. It achieves AUROC 1.0 on the held-out test set (matching Exp 89 reference). It handles modus ponens, modus tollens, disjunctive syllogism, and affirming the consequent β not arbitrary logic or modal reasoning.
constraint-propagation-logic
Learned Ising constraint model for logical syllogism verification.
Trained via discriminative Contrastive Divergence (Exp 62/89) on 400 verified (question, correct, wrong) triples for four syllogism types. Assigns a scalar energy to binary-encoded responses β lower energy means the response correctly identifies whether a syllogism is valid or invalid.
What It Is
An Ising Energy-Based Model (EBM):
E(x) = -(b^T s + s^T J s) s = 2x - 1 β {-1, +1}^200
The coupling matrix J encodes feature pair co-occurrences that distinguish valid from invalid logic responses (e.g., "follows by modus ponens" + "this follows" co-occurring is a strong correct-response signal).
How It Was Trained
Algorithm: Discriminative Contrastive Divergence (full-batch)
| Hyperparameter | Value |
|---|---|
| Training pairs | 400 (80% of 500 generated) |
| Feature dimension | 200 binary features |
| Learning rate | 0.01 |
| L1 regularization | 0.0 |
| Weight decay | 0.005 |
| Epochs | 300 |
| Source | Exp 62 (domain CD) + Exp 89 (self-bootstrap) |
Training data: Programmatically generated syllogism triples β four types:
- Modus ponens (valid): "If all X are P, and all Y are X, what follows?" β "All Y are P. This follows by modus ponens."
- Modus tollens (valid): "If all X are P, and Y are not P, what follows?" β "Y are not X. This follows by modus tollens."
- Disjunctive syllogism (valid): "Either X are P or X are Q. X are not P. What follows?" β "X are Q. This follows by disjunctive syllogism."
- Affirming the consequent (invalid): "If all X are P, and Y are P, are Y necessarily X?" β "No. This is affirming the consequent."
Wrong answers reverse the conclusion or deny the correct inference pattern.
Benchmark Results
| Metric | This export | Exp 89 reference |
|---|---|---|
| AUROC (test) | 1.0000 | 1.0000 |
| Accuracy (test) | 100.0% | 100.0% |
| Test set size | 100 | 31 |
| Baseline AUROC | 0.5 | 0.5 |
Perfect separation: the structural features of valid-logic responses (keywords "follows by", "modus ponens", "modus tollens", "disjunctive syllogism", "affirming the consequent") are highly discriminative.
Usage
import numpy as np
from carnot.inference.constraint_models import ConstraintPropagationModel
# Load model
model = ConstraintPropagationModel.from_pretrained(
"exports/constraint-propagation-models/logic"
)
# Encode a response
from scripts.export_constraint_models import encode_answer
question = "If all cats are mortal, and all robots are cats, what follows?"
answer = "All robots are mortal. This follows by modus ponens."
x = encode_answer(question, answer)
energy = model.energy(x) # low = valid syllogism
score = model.score(x) # high = more likely a correct logic response
print(f"Energy: {energy:.2f}, Score: {score:.3f}")
Limitations
- Keyword-dependent: Perfect AUROC is achieved because valid syllogism responses contain distinctive keywords ("modus ponens", "follows by"). A correct response without these keywords may score poorly.
- Four syllogism types only: Trained on modus ponens, modus tollens, disjunctive syllogism, and affirming the consequent. Other logic forms (hypothetical syllogism, constructive dilemma) are out of scope.
- Template-generated training data: Real LLM responses vary in phrasing and may not contain the expected keyword cues.
- Feature encoder required: Must use the same 200-dim structural encoder.
Files
| File | Description |
|---|---|
model.safetensors |
Coupling matrix J (200Γ200) and bias b (200,) as float32 |
config.json |
Training metadata and benchmark results |
README.md |
This file |
Citation
@misc{carnot2026constraint_logic,
title = {Carnot Constraint Propagation Model: Logic},
author = {Carnot-EBM},
year = {2026},
url = {https://github.com/ianblenke/carnot}
}
Spec
- REQ-VERIFY-002, REQ-VERIFY-003, FR-11
- Downloads last month
- -