Research Artifact โ Not Production-Ready
This model verifies code implementation responses using structural binary features. It achieves AUROC 0.867 on the held-out test set (Exp 89 reference: 0.9096). It detects common off-by-one, wrong-initialization, and wrong-logic bugs โ not arbitrary code errors.
constraint-propagation-code
Learned Ising constraint model for code implementation verification.
Trained via discriminative Contrastive Divergence (Exp 62/89) on 400 verified (question, correct, wrong) triples for code implementation tasks. Assigns a scalar energy to binary-encoded code responses โ lower energy means the code is more likely to be a correct implementation.
What It Is
An Ising Energy-Based Model (EBM):
E(x) = -(b^T s + s^T J s) s = 2x - 1 โ {-1, +1}^200
The coupling matrix J encodes co-occurrence patterns between structural code
features that distinguish correct from buggy implementations (e.g., "has
range(1, n + 1)" is strongly coupled to correctness in sum-range tasks).
How It Was Trained
Algorithm: Discriminative Contrastive Divergence (full-batch)
| Hyperparameter | Value |
|---|---|
| Training pairs | 400 (80% of 500 generated) |
| Feature dimension | 200 binary features |
| Learning rate | 0.01 |
| L1 regularization | 0.0 |
| Weight decay | 0.005 |
| Epochs | 300 |
| Source | Exp 62 (domain CD) + Exp 89 (self-bootstrap) |
Training data: Ten code implementation templates with exactly one bug per wrong implementation:
| Function | Correct pattern | Bug pattern |
|---|---|---|
| sum_range | range(1, n + 1) |
range(1, n) (off-by-one) |
| find_max | result = lst[0] |
result = 0 (wrong init) |
| is_even | n % 2 == 0 |
n % 2 == 1 (inverted) |
| factorial | base case n == 0 |
base case n == 1 (misses 0!) |
| reverse_string | s[::-1] |
s[::1] (no-op) |
| count_vowels | s.lower() |
missing lower() |
| fibonacci | base case n <= 0 |
base case n == 1 |
| binary_search | lo = mid + 1 |
lo = mid (infinite loop) |
| is_palindrome | s == s[::-1] |
s == ''.join(sorted(s)) |
| flatten | result.extend(flatten(item)) |
result.extend(item) (shallow) |
Benchmark Results
| Metric | This export | Exp 89 reference |
|---|---|---|
| AUROC (test) | 0.8669 | 0.9096 |
| Accuracy (test) | 88.0% | 88.0% |
| Test set size | 100 | 25 |
| Baseline AUROC | 0.5 | 0.5 |
The gap from Exp 89 reflects the richer constraint features (245-dim pipeline features vs 200-dim structural encoding used here). Accuracy matches perfectly.
Usage
import numpy as np
from carnot.inference.constraint_models import ConstraintPropagationModel
# Load model
model = ConstraintPropagationModel.from_pretrained(
"exports/constraint-propagation-models/code"
)
# Encode a code response
from scripts.export_constraint_models import encode_answer
question = "Write a function that returns the sum of integers from 1 to n."
correct_code = "def sum_range(n):\n total = 0\n for i in range(1, n + 1):\n total += i\n return total"
buggy_code = "def sum_range(n):\n total = 0\n for i in range(1, n):\n total += i\n return total"
x_correct = encode_answer(question, correct_code)
x_buggy = encode_answer(question, buggy_code)
print(f"Correct energy: {model.energy(x_correct):.2f}") # should be lower
print(f"Buggy energy: {model.energy(x_buggy):.2f}") # should be higher
print(f"Correct score: {model.score(x_correct):.3f}") # should be higher
print(f"Buggy score: {model.score(x_buggy):.3f}") # should be lower
Limitations
- Template bugs only: Trained on 10 specific bug patterns. Novel bug types (e.g., wrong algorithm choice, logic errors in business rules) are outside scope.
- Structural features only: Detects structural signals like "has
range(1, n + 1)" or "hasisinstance" โ does not execute or parse the code. - Lowest AUROC of the three domains: Code structure is less distinctive than arithmetic or logic patterns. AUROC 0.867 vs 1.0 for logic/arithmetic.
- No semantic understanding: Two implementations with the same keywords but different logic will have similar energies.
Files
| File | Description |
|---|---|
model.safetensors |
Coupling matrix J (200ร200) and bias b (200,) as float32 |
config.json |
Training metadata and benchmark results |
README.md |
This file |
Citation
@misc{carnot2026constraint_code,
title = {Carnot Constraint Propagation Model: Code},
author = {Carnot-EBM},
year = {2026},
url = {https://github.com/ianblenke/carnot}
}
Spec
- REQ-VERIFY-002, REQ-VERIFY-003, FR-11
- Downloads last month
- -