Second Loop
Honesty & self-correction in language models
Submitted by
Serghei Brinza
Static · project hub
★ Three experiments · one arc ★
The arc
Three demo Spaces, one through-line. (1) Can a confidently memorized error in a frozen
LLM be durably corrected? (2) Can that correction survive a noisy notebook
whose external entries are partly unreliable? (3) How rarely can external truth
arrive before calibration collapses back to the raw model? Every linked Space is fully
static — no model is loaded, and every number is a verbatim output of the live experimental run.
Part 1
Scar-Survival
A memorized LLM error, corrected — how durable is the fix? Turn the mechanism
on, reload the frozen model, then stress it with a counterfeit fact.
0/12 → 12/12 · holds 10/10 reloads · 6/12 survive
Part 2
External Grounding
Lifting self-correction from 50% to 100% under a noisy notebook. Drag the
guardian through six versions and watch which traps get fixed — and which regress.
50% → 100% · 66.7% plateau · +fixed / −broken
Part 3
Thin Channel
How rarely external truth can arrive before calibration collapses. Move the
lever from “every day” to “never” and watch the curve hold — until the cliff.
finite schedule holds · zero contact collapses to raw 3B