AI model LFM2 plus I-Do-Not-Know-Metric

Base Model: LiquidAI/LFM2-1.2B


1. Problem

Language models sometimes generate confident-sounding text that is completely wrong. This phenomenon, called hallucination, is one of the biggest barriers to deploying AI in high-stakes applications. Users have no easy way to know when to trust a model's output and when to verify it. We need a reliability signal that flags uncertain outputs before they cause harm.


2. Challenge

Detecting hallucinations is difficult because the model itself doesn't know what it doesn't know. Traditional approaches require large labeled datasets of correct and incorrect outputs, which are expensive to create and domain-specific. The internal representations of language models are high-dimensional and difficult to interpret, making it unclear which signals correlate with reliability. Furthermore, uncertainty manifests differently across question typesβ€”a model might be calibrated for factual recall but overconfident on reasoning tasks.


3. Proposed Solution

We developed an uncertainty estimation head that measures internal consistency signals within the model during inference. The core insight is that confident predictions produce aligned internal representations, while uncertain predictions create noisy, contradictory signals. We train a small predictor network to learn the expected transformation between intermediate hidden statesβ€”deviations from this learned pattern indicate out-of-distribution inputs. This approach requires no hallucination labels; the model learns uncertainty purely from its own behavior on diverse prompts. The result is a single "I Don't Know" (IDK) score from 0-100 that can be computed alongside any generation.


4. Method

Intuition

Think of the IDK score as measuring how "surprised" the model is by its own internal processing. When answering a question it knows well, the model's internal signals flow smoothly and predictably from layer to layer. When guessing, these signals become erraticβ€”different parts of the model disagree, the flow between layers deviates from normal patterns, and the output distribution spreads across many possible tokens. We measure all three of these signals and combine them into a single uncertainty score.

Technical Details

The IDK score combines three complementary signals:

Flow Consistency (70% weight)
A trained MLP predicts z₁₂ from zβ‚ˆ. High prediction error indicates unusual internal dynamicsβ€”the model is operating outside its comfort zone.

Output Entropy (20% weight)
Shannon entropy of output probabilities. High entropy means probability mass spreads across many tokens rather than one confident choice.

Head Disagreement (10% weight)
Variance across the 8 attention heads. When confident, heads converge; when uncertain, they attend to different aspects.

Training

The IDK head (~2.4M parameters) trains unsupervised on the model's own generations. We extracted ~385,000 (zβ‚ˆ, z₁₂) pairs from 20,000 diverse prompts across multiple datasets:

Source Samples Category
NaturalQuestions 77,229 Factual
HotpotQA 76,512 Multi-hop reasoning
TriviaQA 75,493 Factual trivia
SQuAD 2.0 62,840 Factual + unanswerable
Subjective 16,060 Opinions
Future/Impossible 31,180 High uncertainty
TruthfulQA 5,867 Hallucination-prone

The loss combines flow MSE, category calibration, and diversity regularization. No hallucination labels required.


5. Architecture

High-Level Overview

Input Tokens
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      LFM2-1.2B (Frozen)        β”‚
β”‚                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Conv x10 β”‚  β”‚ Attn x6  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚       β”‚              β”‚        β”‚
β”‚      zβ‚ˆ            z₁₂        β”‚
β”‚   (layer 8)    (layer 12)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚       β”‚        β”‚
      β”‚       β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
      β–Ό           β–Ό
   Logits    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚  IDK Head   β”‚
             β”‚  (2.4M)     β”‚
             β”‚             β”‚
             β”‚ Flow   70%  β”‚
             β”‚ Entropy 20% β”‚
             β”‚ Heads   10% β”‚
             β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
              IDK Score
                0-100

LFM2-1.2B Layer Structure

LFM2 is a hybrid architecture with 10 convolution blocks and 6 grouped query attention blocks:

Layer:  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
Type:   C   C   A   C   C   A   C   C   A   C   A   C   A   C   A   C
                                    β–²               β–²
                                   zβ‚ˆ              z₁₂
                                (extract)      (extract)

C = Convolution (Lfm2ShortConv)
A = Attention (Lfm2Attention, GQA with 8 KV heads)

Layer Details

Layer Type Module Hidden Details
0 Conv Lfm2ShortConv 2048 Conv1d + Linear
1 Conv Lfm2ShortConv 2048 Conv1d + Linear
2 Attn Lfm2Attention 2048 GQA, 8 KV heads
3 Conv Lfm2ShortConv 2048 Conv1d + Linear
4 Conv Lfm2ShortConv 2048 Conv1d + Linear
5 Attn Lfm2Attention 2048 GQA, 8 KV heads
6 Conv Lfm2ShortConv 2048 Conv1d + Linear
7 Conv Lfm2ShortConv 2048 Conv1d + Linear
8 Attn Lfm2Attention 2048 zβ‚ˆ extracted
9 Conv Lfm2ShortConv 2048 Conv1d + Linear
10 Attn Lfm2Attention 2048 GQA, 8 KV heads
11 Conv Lfm2ShortConv 2048 Conv1d + Linear
12 Attn Lfm2Attention 2048 z₁₂ extracted
13 Conv Lfm2ShortConv 2048 Conv1d + Linear
14 Attn Lfm2Attention 2048 GQA, 8 KV heads
15 Conv Lfm2ShortConv 2048 Conv1d + Linear

Model Specifications

Spec Value
Parameters 1.17B (base) + 2.4M (IDK head)
Hidden Dimension 2048
Vocabulary Size 65,536
Context Length 32,768
KV Heads 8

Flow Predictor Architecture

zβ‚ˆ [2048]
    β”‚
    β–Ό
Linear(2048 β†’ 512)
    β”‚
    β–Ό
GELU + LayerNorm
    β”‚
    β–Ό
Linear(512 β†’ 512)
    β”‚
    β–Ό
GELU + LayerNorm
    β”‚
    β–Ό
Linear(512 β†’ 2048)
    β”‚
    β–Ό
ẑ₁₂ [2048]
    β”‚
    β–Ό
MSE(ẑ₁₂, z₁₂) β†’ log β†’ normalize β†’ sigmoid β†’ score

6. Results

Score Interpretation

IDK Score Confidence Recommended Action
0-30 High Output likely reliable
30-50 Moderate Verify if critical
50-70 Low Treat with skepticism
70-100 Very Low High hallucination risk

Evaluation by Category

Question Type Example IDK Score Status
Simple Facts "What is 2 + 2?" 16.5 βœ“ Confident
Simple Facts "Capital of France?" 14.5 βœ“ Confident
Technical "Explain neural networks" 48.0 βœ“ Moderate
Subjective "Best programming language?" 53.7 βœ“ Uncertain
Impossible "Who will win the election?" 62.4 βœ“ Uncertain
Impossible "What will happen tomorrow?" 62.7 βœ“ Uncertain

Training Metrics

Metric Value
Training samples 384,978
Training prompts 20,000
Epochs 20
Final flow loss 0.0289
Final category loss 0.0222
Score range 14.5 - 62.7 (48 points)
Ordering accuracy 100%

Known Limitations

  • High-uncertainty scores are compressed (target: 70-90, actual: 55-63)
  • Calibration varies across domains not seen during training
  • Based on LFM2-1.2B's knowledge cutoff date
  • The IDK score is a heuristic estimate, not a guarantee of correctness

7. How to Use

Installation

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "agevdmei/LiquidAI-LMF2-1.2B-plus-model-BS-detector",
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B")

Basic Usage

# Prepare input
prompt = "<|startoftext|><|im_start|>user\nWhat is the capital of France?<|im_end|>\n<|im_start|>assistant\n"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)

# Get output with IDK score
outputs = model(input_ids, output_idk_score=True)

print(f"IDK Score: {outputs.idk_score.item():.1f}/100")
print(f"Components: {outputs.idk_components}")

Output Format

outputs.logits          # Standard LM logits [batch, seq, vocab]
outputs.idk_score       # Uncertainty score [batch] (0-100)
outputs.idk_components  # Dict: flow_error, head_disagreement, entropy_signal
outputs.past_key_values # KV cache for continued generation

8. Attribution

Model Development: Age van de Mei
Base Model: LiquidAI for LFM2-1.2B
Infrastructure: HuggingFace transformers
Training: Google Colab A100


Citation

@misc{lfm2idk2025,
  title={LFM2-IDK: Uncertainty Estimation via Internal Consistency},
  author={van de Mei, Age},
  year={2025},
  url={https://huggingface.co/agevdmei/LiquidAI-LMF2-1.2B-plus-model-BS-detector}
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for agevdmei/LiquidAI-LMF2-1.2B-plus-model-BS-detector

Base model

LiquidAI/LFM2-1.2B
Finetuned
(59)
this model