uncertain-calibrate
Fine-tuned from meta-llama/Llama-3.1-8B-Instruct via GRPO reinforcement learning to emit a special <uncertain> token when the model is uncertain during reasoning, enabling uncertainty-guided adaptive retrieval.
What it does
The model reasons step-by-step and inserts <uncertain> at any point where it lacks confidence in a fact. A lightweight ridge regression probe (trained on layer-13 hidden states at the <uncertain> span) then decides whether to trigger BM25 retrieval and a second-pass generation.
Training
- Base model:
meta-llama/Llama-3.1-8B-Instruct - Training method: GRPO (Group Relative Policy Optimization) with EM-based reward; the model is rewarded for correct final answers, encouraging it to emit
<uncertain>in contexts where retrieval would help - Target datasets: Multi-hop QA (HotpotQA, MuSiQue, 2WikiMultiHopQA) and open-domain QA (NQ, TriviaQA)
Retrieval gating (probe)
A separate ridge regression probe on layer-13 hidden states over <uncertain> spans must be trained to use this model for adaptive RAG. The probe AUROC on held-out data is ~0.82. Use the companion probe artifact uncertain_probe_layer13_alpha3000.pkl from the AdaRAGUE repository.
Evaluation (dev_500_subsampled, 500 questions 脳 5 datasets, with probe gating)
| Dataset | EM | F1 | Trigger Rate |
|---|---|---|---|
| HotpotQA | 32.6 | 42.7 | 67.4% |
| MuSiQue | 7.6 | 14.1 | 94.2% |
| 2WikiMultiHopQA | 26.2 | 29.6 | 59.2% |
| NQ | 31.4 | 41.0 | 52.0% |
| TriviaQA | 56.6 | 63.2 | 34.0% |
| Overall | 30.9 | 38.1 | 61.4% |
Trigger rate = fraction of questions where the probe decided to retrieve.
Intended use
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("your-username/uncertain-calibrate")
model = AutoModelForCausalLM.from_pretrained("your-username/uncertain-calibrate")
SYSTEM = (
"You are a helpful reasoning assistant. Think step by step. "
"If at any point you are uncertain about a fact, emit the special token "
"<uncertain> to signal that you need more information. "
"End your response with 'Answer: <your answer>' on the last line."
)
prompt = tokenizer.apply_chat_template([
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Who directed the film Interstellar?"},
], tokenize=False, add_generation_prompt=True)
- Downloads last month
- 103
Model tree for jamesjunyuguo/uncertain-calibrate
Base model
meta-llama/Llama-3.1-8B