Note: These are Phase 1 research artifacts โ per-token activation EBMs that detect hallucination confidence signals from LLM hidden states. For production use of the full Carnot EBM framework (constraint verification, guided decoding, energy-based repair), see:
pip install carnotSource and documentation: https://github.com/ianblenke/carnot
Important: Research Artifact, Not a Production Detector
This model achieves 86.8% on held-out TruthfulQA test sets, but in practical deployment (8 real questions), activation-based EBMs agreed with ground truth only 50% of the time. The EBM detects model confidence, not correctness โ confident hallucinations get low energy (look fine) while correct-but-hedging answers get flagged.
This model is a research artifact documenting activation-space structure. It is NOT a reliable hallucination detector for production use.
For practical verification, use structural constraints (test execution, SAT solving) rather than activation analysis. See the Carnot technical report for 41 experiments and 14 principles learned.
per-token-ebm-gemma4-e2b-nothink
Per-token hallucination detection EBM for google/gemma-4-E2B.
| Metric | Value |
|---|---|
| Test accuracy | 86.8% |
| Energy gap | 3.8514 |
| Source model | google/gemma-4-E2B |
| Hidden dim | 1536 |
| Architecture | Gibbs [1536 โ 512 โ 128 โ 1], SiLU |
| Training tokens | 15,822 |
| Thinking | disabled |
Usage
from carnot.inference.ebm_loader import load_ebm
ebm = load_ebm("per-token-ebm-gemma4-e2b-nothink")
energy = float(ebm.energy(activation_vector))
# Low energy = likely correct, high energy = likely hallucination
Trained with Carnot.
- Downloads last month
- 45