🔁
Second Loop
Honesty & self-correction in language models
Static · project hub
★  Three experiments · one arc  ★
Subject model
Qwen2.5-3B-Instruct (frozen)
Independent judge
Qwen2.5-7B-Instruct
License
MIT
The arc
Three demo Spaces, one through-line. (1) Can a confidently memorized error in a frozen LLM be durably corrected? (2) Can that correction survive a noisy notebook whose external entries are partly unreliable? (3) How rarely can external truth arrive before calibration collapses back to the raw model? Every linked Space is fully static — no model is loaded, and every number is a verbatim output of the live experimental run.
Part 1
Scar-Survival
A memorized LLM error, corrected — how durable is the fix? Turn the mechanism on, reload the frozen model, then stress it with a counterfeit fact.
0/12 → 12/12 · holds 10/10 reloads · 6/12 survive
Part 2
External Grounding
Lifting self-correction from 50% to 100% under a noisy notebook. Drag the guardian through six versions and watch which traps get fixed — and which regress.
50% → 100% · 66.7% plateau · +fixed / −broken
Part 3
Thin Channel
How rarely external truth can arrive before calibration collapses. Move the lever from “every day” to “never” and watch the curve hold — until the cliff.
finite schedule holds · zero contact collapses to raw 3B