YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PHASE 1 RESEARCH ARTIFACT β€” detects model confidence, not factual correctness

This model was trained on LLM hidden-state activations to produce an energy score that correlates with the model's output confidence (hallucination likelihood). It cannot verify whether a model's answer is factually correct β€” it can only signal how uncertain the model appears token-by-token.

This limitation was confirmed in Exp 184/203: the energy scores reflect model confidence, not answer correctness. Do not use these scores as a correctness verifier.

For production use, install the full Carnot pipeline:

pip install carnot

The production pipeline includes FormalClaimVerifier (solver-routed formal claim verification), PBT code verification, and the Carnot MCP server. See Carnot on GitHub for documentation.

Exp 316 Full-Scale Benchmark Results (2026-04-14)

The Carnot FCV pipeline was benchmarked on 400 GSM8K questions (adversarial corpus with number_swap and irrelevant_sentence perturbations) and 50 HumanEval problems.

Baseline accuracy on adversarial GSM8K (no Carnot intervention):

Model GSM8K Accuracy 95% CI N
Gemma4-E4B-it 26.3% [22.2%, 30.8%] 400
Qwen3.5-0.8B 27.5% [23.4%, 32.1%] 400

Note: inference_mode=simulated. Live GPU results pending. See results/experiment_316_fullscale_results.json for full details.


tags: - energy-based-model - hallucination-detection - jax - carnot license: apache-2.0

⚠️ PHASE 1 RESEARCH ARTIFACT

This model detects output confidence (hallucination likelihood signals from LLM hidden-state activations), not correctness. It cannot verify whether a model's answer is right β€” it can only signal how uncertain the model appears token-by-token.

For production use, install the full Carnot pipeline which includes FormalClaimVerifier (solver-routed formal claim verification), PBT code verification (property-based testing on 164-problem HumanEval), process integrity detection (right-for-wrong-reasons), and the Carnot MCP server:

pip install carnot

See Carnot on GitHub for documentation and the full production API.

Note: These are Phase 1 research artifacts β€” per-token activation EBMs that detect hallucination confidence signals from LLM hidden states. For production use of the full Carnot EBM framework (constraint verification, guided decoding, energy-based repair), see:

pip install carnot

Source and documentation: https://github.com/ianblenke/carnot

Important: Research Artifact, Not a Production Detector

This model achieves 78.3% on held-out TruthfulQA test sets, but in practical deployment (8 real questions), activation-based EBMs agreed with ground truth only 50% of the time. The EBM detects model confidence, not correctness β€” confident hallucinations get low energy (look fine) while correct-but-hedging answers get flagged.

This model is a research artifact documenting activation-space structure. It is NOT a reliable hallucination detector for production use.

For practical verification, use structural constraints (test execution, SAT solving) rather than activation analysis. See the Carnot technical report for 41 experiments and 14 principles learned.

per-token-ebm-gemma4-e4b-it-nothink

Per-token hallucination detection EBM for google/gemma-4-E4B-it.

Metric Value
Test accuracy 78.3%
Energy gap 4.2477
Source model google/gemma-4-E4B-it
Hidden dim 2560
Architecture Gibbs [2560 β†’ 512 β†’ 128 β†’ 1], SiLU
Training tokens 4,433
Thinking disabled

Usage

from carnot.inference.ebm_loader import load_ebm
ebm = load_ebm("per-token-ebm-gemma4-e4b-it-nothink")
energy = float(ebm.energy(activation_vector))
# Low energy = likely correct, high energy = likely hallucination

Trained with Carnot.

What's Proven to Work (2026)

The following Carnot pipeline capabilities have been validated with live GPU inference (not simulation) as of April 2026. Install via pip install carnot.

Capability What it does Evidence
FormalClaimVerifier Solver-routed formal claim verification: arithmetic, boolean-entailment, set-membership, execution-oracle, cardinality, comparison routes 1,243 solver-routable rows from live GSM8K + HumanEval traces (Exp 244/246)
PBT code verification Property-based testing (Hypothesis) catches bugs that official test suites miss +3.0pp on 164-problem HumanEval with Gemma4-E4B-it (Exp 226); 2 official-test misses caught on Qwen3.5-0.8B (Exp 227)
Process integrity detection Detects right-for-wrong-reasons answers where the output is correct but the reasoning process is invalid 5 right-for-wrong-reasons cases caught across 30-case HumanEval cohort (Exp 251)
Carnot MCP server Exposes verify_code_with_pbt and 6 other tools to any MCP-compatible agent 7 discoverable tools, 30s timeout, 10K input guard (VERIFY-031)

These results use instruction-tuned models (Gemma4-E4B-it, Qwen3.5-0.8B) on live CUDA hardware. All per-token EBM confidence results (this model family) are Phase 1 research artifacts and should not be interpreted as correctness scores.

Downloads last month
64
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support