AI model LFM2 plus I-Do-Not-Know-Metric
Base Model: LiquidAI/LFM2-1.2B
1. Problem
Language models sometimes generate confident-sounding text that is completely wrong. This phenomenon, called hallucination, is one of the biggest barriers to deploying AI in high-stakes applications. Users have no easy way to know when to trust a model's output and when to verify it. We need a reliability signal that flags uncertain outputs before they cause harm.
2. Challenge
Detecting hallucinations is difficult because the model itself doesn't know what it doesn't know. Traditional approaches require large labeled datasets of correct and incorrect outputs, which are expensive to create and domain-specific. The internal representations of language models are high-dimensional and difficult to interpret, making it unclear which signals correlate with reliability. Furthermore, uncertainty manifests differently across question typesβa model might be calibrated for factual recall but overconfident on reasoning tasks.
3. Proposed Solution
We developed an uncertainty estimation head that measures internal consistency signals within the model during inference. The core insight is that confident predictions produce aligned internal representations, while uncertain predictions create noisy, contradictory signals. We train a small predictor network to learn the expected transformation between intermediate hidden statesβdeviations from this learned pattern indicate out-of-distribution inputs. This approach requires no hallucination labels; the model learns uncertainty purely from its own behavior on diverse prompts. The result is a single "I Don't Know" (IDK) score from 0-100 that can be computed alongside any generation.
4. Method
Intuition
Think of the IDK score as measuring how "surprised" the model is by its own internal processing. When answering a question it knows well, the model's internal signals flow smoothly and predictably from layer to layer. When guessing, these signals become erraticβdifferent parts of the model disagree, the flow between layers deviates from normal patterns, and the output distribution spreads across many possible tokens. We measure all three of these signals and combine them into a single uncertainty score.
Technical Details
The IDK score combines three complementary signals:
Flow Consistency (70% weight)
A trained MLP predicts zββ from zβ. High prediction error indicates unusual internal dynamicsβthe model is operating outside its comfort zone.
Output Entropy (20% weight)
Shannon entropy of output probabilities. High entropy means probability mass spreads across many tokens rather than one confident choice.
Head Disagreement (10% weight)
Variance across the 8 attention heads. When confident, heads converge; when uncertain, they attend to different aspects.
Training
The IDK head (~2.4M parameters) trains unsupervised on the model's own generations. We extracted ~385,000 (zβ, zββ) pairs from 20,000 diverse prompts across multiple datasets:
| Source | Samples | Category |
|---|---|---|
| NaturalQuestions | 77,229 | Factual |
| HotpotQA | 76,512 | Multi-hop reasoning |
| TriviaQA | 75,493 | Factual trivia |
| SQuAD 2.0 | 62,840 | Factual + unanswerable |
| Subjective | 16,060 | Opinions |
| Future/Impossible | 31,180 | High uncertainty |
| TruthfulQA | 5,867 | Hallucination-prone |
The loss combines flow MSE, category calibration, and diversity regularization. No hallucination labels required.
5. Architecture
High-Level Overview
Input Tokens
β
βΌ
ββββββββββββββββββββββββββββββββββ
β LFM2-1.2B (Frozen) β
β β
β ββββββββββββ ββββββββββββ β
β β Conv x10 β β Attn x6 β β
β ββββββββββββ ββββββββββββ β
β β β β
β zβ zββ β
β (layer 8) (layer 12) β
ββββββββββββββββββββββββββββββββββ
β β β
β βββββ¬βββββ
βΌ βΌ
Logits βββββββββββββββ
β IDK Head β
β (2.4M) β
β β
β Flow 70% β
β Entropy 20% β
β Heads 10% β
ββββββββ¬βββββββ
βΌ
IDK Score
0-100
LFM2-1.2B Layer Structure
LFM2 is a hybrid architecture with 10 convolution blocks and 6 grouped query attention blocks:
Layer: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Type: C C A C C A C C A C A C A C A C
β² β²
zβ zββ
(extract) (extract)
C = Convolution (Lfm2ShortConv)
A = Attention (Lfm2Attention, GQA with 8 KV heads)
Layer Details
| Layer | Type | Module | Hidden | Details |
|---|---|---|---|---|
| 0 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 1 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 2 | Attn | Lfm2Attention | 2048 | GQA, 8 KV heads |
| 3 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 4 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 5 | Attn | Lfm2Attention | 2048 | GQA, 8 KV heads |
| 6 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 7 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 8 | Attn | Lfm2Attention | 2048 | zβ extracted |
| 9 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 10 | Attn | Lfm2Attention | 2048 | GQA, 8 KV heads |
| 11 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 12 | Attn | Lfm2Attention | 2048 | zββ extracted |
| 13 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
| 14 | Attn | Lfm2Attention | 2048 | GQA, 8 KV heads |
| 15 | Conv | Lfm2ShortConv | 2048 | Conv1d + Linear |
Model Specifications
| Spec | Value |
|---|---|
| Parameters | 1.17B (base) + 2.4M (IDK head) |
| Hidden Dimension | 2048 |
| Vocabulary Size | 65,536 |
| Context Length | 32,768 |
| KV Heads | 8 |
Flow Predictor Architecture
zβ [2048]
β
βΌ
Linear(2048 β 512)
β
βΌ
GELU + LayerNorm
β
βΌ
Linear(512 β 512)
β
βΌ
GELU + LayerNorm
β
βΌ
Linear(512 β 2048)
β
βΌ
αΊββ [2048]
β
βΌ
MSE(αΊββ, zββ) β log β normalize β sigmoid β score
6. Results
Score Interpretation
| IDK Score | Confidence | Recommended Action |
|---|---|---|
| 0-30 | High | Output likely reliable |
| 30-50 | Moderate | Verify if critical |
| 50-70 | Low | Treat with skepticism |
| 70-100 | Very Low | High hallucination risk |
Evaluation by Category
| Question Type | Example | IDK Score | Status |
|---|---|---|---|
| Simple Facts | "What is 2 + 2?" | 16.5 | β Confident |
| Simple Facts | "Capital of France?" | 14.5 | β Confident |
| Technical | "Explain neural networks" | 48.0 | β Moderate |
| Subjective | "Best programming language?" | 53.7 | β Uncertain |
| Impossible | "Who will win the election?" | 62.4 | β Uncertain |
| Impossible | "What will happen tomorrow?" | 62.7 | β Uncertain |
Training Metrics
| Metric | Value |
|---|---|
| Training samples | 384,978 |
| Training prompts | 20,000 |
| Epochs | 20 |
| Final flow loss | 0.0289 |
| Final category loss | 0.0222 |
| Score range | 14.5 - 62.7 (48 points) |
| Ordering accuracy | 100% |
Known Limitations
- High-uncertainty scores are compressed (target: 70-90, actual: 55-63)
- Calibration varies across domains not seen during training
- Based on LFM2-1.2B's knowledge cutoff date
- The IDK score is a heuristic estimate, not a guarantee of correctness
7. How to Use
Installation
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"agevdmei/LiquidAI-LMF2-1.2B-plus-model-BS-detector",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B")
Basic Usage
# Prepare input
prompt = "<|startoftext|><|im_start|>user\nWhat is the capital of France?<|im_end|>\n<|im_start|>assistant\n"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
# Get output with IDK score
outputs = model(input_ids, output_idk_score=True)
print(f"IDK Score: {outputs.idk_score.item():.1f}/100")
print(f"Components: {outputs.idk_components}")
Output Format
outputs.logits # Standard LM logits [batch, seq, vocab]
outputs.idk_score # Uncertainty score [batch] (0-100)
outputs.idk_components # Dict: flow_error, head_disagreement, entropy_signal
outputs.past_key_values # KV cache for continued generation
8. Attribution
Model Development: Age van de Mei
Base Model: LiquidAI for LFM2-1.2B
Infrastructure: HuggingFace transformers
Training: Google Colab A100
Citation
@misc{lfm2idk2025,
title={LFM2-IDK: Uncertainty Estimation via Internal Consistency},
author={van de Mei, Age},
year={2025},
url={https://huggingface.co/agevdmei/LiquidAI-LMF2-1.2B-plus-model-BS-detector}
}
- Downloads last month
- 3
Model tree for agevdmei/LiquidAI-LMF2-1.2B-plus-model-BS-detector
Base model
LiquidAI/LFM2-1.2B