uncertain-calibrate

Fine-tuned from meta-llama/Llama-3.1-8B-Instruct via GRPO reinforcement learning to emit a special <uncertain> token when the model is uncertain during reasoning, enabling uncertainty-guided adaptive retrieval.

What it does

The model reasons step-by-step and inserts <uncertain> at any point where it lacks confidence in a fact. A lightweight ridge regression probe (trained on layer-13 hidden states at the <uncertain> span) then decides whether to trigger BM25 retrieval and a second-pass generation.

Training

Base model: meta-llama/Llama-3.1-8B-Instruct
Training method: GRPO (Group Relative Policy Optimization) with EM-based reward; the model is rewarded for correct final answers, encouraging it to emit <uncertain> in contexts where retrieval would help
Target datasets: Multi-hop QA (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA)

Retrieval gating (probe)

A separate ridge regression probe on layer-13 hidden states over <uncertain> spans must be trained to use this model for adaptive RAG. The probe AUROC on held-out data is ~0.82. Use the companion probe artifact uncertain_probe_layer13_alpha3000.pkl from the AdaRAGUE repository.

Evaluation (dev_500_subsampled, 500 questions × 5 datasets, with probe gating)

Dataset	EM	F1	Trigger Rate
HotpotQA	32.6	42.7	67.4%
MuSiQue	7.6	14.1	94.2%
2WikiMultiHopQA	26.2	29.6	59.2%
NQ	31.4	41.0	52.0%
TriviaQA	56.6	63.2	34.0%
Overall	30.9	38.1	61.4%

Trigger rate = fraction of questions where the probe decided to retrieve.

Intended use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/uncertain-calibrate")
model = AutoModelForCausalLM.from_pretrained("your-username/uncertain-calibrate")

SYSTEM = (
    "You are a helpful reasoning assistant. Think step by step. "
    "If at any point you are uncertain about a fact, emit the special token "
    "<uncertain> to signal that you need more information. "
    "End your response with 'Answer: <your answer>' on the last line."
)

prompt = tokenizer.apply_chat_template([
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": "Who directed the film Interstellar?"},
], tokenize=False, add_generation_prompt=True)

Downloads last month: 103

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for jamesjunyuguo/uncertain-calibrate

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2713)

this model