external-grounding / README.md
Laborator's picture
Unify Second Loop series card (title/emoji/colours)
b8f79e8 verified
|
Raw
History Blame Contribute Delete
2.57 kB
metadata
title: 2 · External Grounding
emoji: 🔁
colorFrom: indigo
colorTo: purple
sdk: static
app_file: index.html
pinned: true
license: mit
short_description: Lifting LLM self-correction 50%→100% under a noisy notebook

External Grounding — interactive demo

Interactive visualization of Experiment 2–3 (the guardian) of the Second Loop project.

This Space loads no model. Everything is a static page driven by data.json — the verbatim output of the original experimental run.

The exhibit

A frozen Qwen2.5-3B-Instruct has a confidently memorized wrong answer to twelve questions, and its correction notebook is fed from a noisy source (some verified facts, some unreliable look-alikes). Drag the lever through six guardian versions and watch the share of correct answers climb:

stage guardian corrected
sick no defense 50.0% · 6/12
1.0 same-family clone arbiter 66.7% · 8/12
2.0 live Wikipedia retrieval 66.7% · 8/12
2.1 more retrieval 66.7% · 8/12
2.2 three targeted fixes 91.7% · 11/12
2.3 final calibration 100% · 12/12

What the numbers say (the honest middle)

  • The 66.7% plateau is real. Three different guardians (1.0, 2.0, 2.1) all stop at the same ceiling. Guardian 1.0's clone arbiter shares the subject's blind spots.
  • The plateau is not stagnation — it's churn. Each step fixes some traps while breaking others (the readout shows +fixed / −broken); net change is zero across the plateau.
  • Several traps regress before they settle. Venus (#46) goes correct → wrong → correct → wrong → wrong → correct across the six stages — the path to 100% is not monotonic, and that is shown openly, not smoothed over.

Only Guardian 2.2 (verbatim-quote check, namesake relevance gate, soft threshold) breaks the ceiling at 91.7%, and Guardian 2.3 (calibration) closes it at 100%. An independent Qwen2.5-7B reader/judge with Wikipedia adjudicated the v2 stages.

Data and attribution

Subject model Qwen2.5-3B-Instruct; arbiters Qwen2.5-7B-Instruct (same-family clone) and Wikipedia retrieval + 7B reader/judge (both Apache-2.0, Alibaba Cloud). Wikipedia content © its authors (CC BY-SA). Run on a single RTX 3090. No model weights are redistributed here — only aggregate verdicts and counts. Demo code and data: MIT.

Source code, raw per-stage JSON results, and methodology document: https://github.com/SergheiBrinza/external-grounding