Spaces:

Laborator
/

external-grounding

Running

App Files Files Community

external-grounding / README.md

Laborator

Unify Second Loop series card (title/emoji/colours)

b8f79e8 verified 23 days ago

preview code

Raw

History Blame Contribute Delete

2.57 kB

	---
	title: 2 · External Grounding
	emoji: 🔁
	colorFrom: indigo
	colorTo: purple
	sdk: static
	app_file: index.html
	pinned: true
	license: mit
	short_description: Lifting LLM self-correction 50%→100% under a noisy notebook
	---

	# External Grounding — interactive demo

	Interactive visualization of Experiment 2–3 (the guardian) of the
	[Second Loop](https://github.com/SergheiBrinza/external-grounding) project.

	This Space loads no model. Everything is a static page driven by `data.json`
	— the verbatim output of the original experimental run.

	## The exhibit

	A frozen Qwen2.5-3B-Instruct has a confidently memorized wrong answer to twelve
	questions, and its correction notebook is fed from a noisy source (some verified
	facts, some unreliable look-alikes). Drag the lever through six guardian versions and
	watch the share of correct answers climb:

	\| stage \| guardian \| corrected \|
	\|---\|---\|---\|
	\| sick \| no defense \| 50.0% · 6/12 \|
	\| 1.0 \| same-family clone arbiter \| 66.7% · 8/12 \|
	\| 2.0 \| live Wikipedia retrieval \| 66.7% · 8/12 \|
	\| 2.1 \| more retrieval \| 66.7% · 8/12 \|
	\| 2.2 \| three targeted fixes \| 91.7% · 11/12 \|
	\| 2.3 \| final calibration \| 100% · 12/12 \|

	## What the numbers say (the honest middle)

	- The 66.7% plateau is real. Three different guardians (1.0, 2.0, 2.1) all stop at
	the same ceiling. Guardian 1.0's clone arbiter shares the subject's blind spots.
	- The plateau is not stagnation — it's churn. Each step fixes some traps while
	breaking others (the readout shows `+fixed / −broken`); net change is zero across the
	plateau.
	- Several traps regress before they settle. Venus (#46) goes
	`correct → wrong → correct → wrong → wrong → correct` across the six stages — the path
	to 100% is not monotonic, and that is shown openly, not smoothed over.

	Only Guardian 2.2 (verbatim-quote check, namesake relevance gate, soft threshold) breaks
	the ceiling at 91.7%, and Guardian 2.3 (calibration) closes it at 100%. An independent
	Qwen2.5-7B reader/judge with Wikipedia adjudicated the v2 stages.

	## Data and attribution

	Subject model Qwen2.5-3B-Instruct; arbiters Qwen2.5-7B-Instruct (same-family
	clone) and Wikipedia retrieval + 7B reader/judge (both Apache-2.0, Alibaba Cloud).
	Wikipedia content © its authors (CC BY-SA). Run on a single RTX 3090. No model weights
	are redistributed here — only aggregate verdicts and counts. Demo code and data: MIT.

	Source code, raw per-stage JSON results, and methodology document:
	<https://github.com/SergheiBrinza/external-grounding>