Spaces:
Running
Running
Museum rebuild: static ascent exhibit (6 guardian stages, 50%->100%)
Browse files- README.md +44 -21
- app.py +0 -333
- data.json +8 -0
- index.html +599 -0
- requirements.txt +0 -2
README.md
CHANGED
|
@@ -1,37 +1,60 @@
|
|
| 1 |
---
|
| 2 |
title: Second Loop · 2 · External Grounding
|
| 3 |
emoji: 🛡️
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
-
sdk:
|
| 7 |
-
|
| 8 |
-
python_version: "3.12"
|
| 9 |
-
app_file: app.py
|
| 10 |
pinned: true
|
| 11 |
license: mit
|
| 12 |
-
short_description:
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# External Grounding
|
| 16 |
|
| 17 |
-
|
| 18 |
-
Experiment 3 (`exp3-guardian-v2-wikipedia`) of the
|
| 19 |
[Second Loop](https://github.com/SergheiBrinza/external-grounding) project.
|
| 20 |
|
| 21 |
-
This Space loads no model
|
| 22 |
-
|
| 23 |
-
runs, bundled into `data.json`.
|
| 24 |
|
| 25 |
-
The
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|---|---|---|
|
| 29 |
-
|
|
| 30 |
-
|
|
| 31 |
-
|
|
| 32 |
-
|
|
| 33 |
-
|
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
Source code, raw per-stage JSON results, and methodology document:
|
| 37 |
<https://github.com/SergheiBrinza/external-grounding>
|
|
|
|
| 1 |
---
|
| 2 |
title: Second Loop · 2 · External Grounding
|
| 3 |
emoji: 🛡️
|
| 4 |
+
colorFrom: gray
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: static
|
| 7 |
+
app_file: index.html
|
|
|
|
|
|
|
| 8 |
pinned: true
|
| 9 |
license: mit
|
| 10 |
+
short_description: Lifting LLM self-correction 50%→100% under a noisy notebook
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# External Grounding — interactive demo
|
| 14 |
|
| 15 |
+
Interactive visualization of Experiment 2–3 (the *guardian*) of the
|
|
|
|
| 16 |
[Second Loop](https://github.com/SergheiBrinza/external-grounding) project.
|
| 17 |
|
| 18 |
+
This Space loads **no model**. Everything is a static page driven by `data.json`
|
| 19 |
+
— the verbatim output of the original experimental run.
|
|
|
|
| 20 |
|
| 21 |
+
## The exhibit
|
| 22 |
|
| 23 |
+
A frozen Qwen2.5-3B-Instruct has a confidently memorized **wrong** answer to twelve
|
| 24 |
+
questions, and its correction notebook is fed from a **noisy** source (some verified
|
| 25 |
+
facts, some unreliable look-alikes). Drag the lever through six guardian versions and
|
| 26 |
+
watch the share of correct answers climb:
|
| 27 |
+
|
| 28 |
+
| stage | guardian | corrected |
|
| 29 |
|---|---|---|
|
| 30 |
+
| sick | no defense | 50.0% · 6/12 |
|
| 31 |
+
| 1.0 | same-family clone arbiter | 66.7% · 8/12 |
|
| 32 |
+
| 2.0 | live Wikipedia retrieval | 66.7% · 8/12 |
|
| 33 |
+
| 2.1 | more retrieval | 66.7% · 8/12 |
|
| 34 |
+
| 2.2 | three targeted fixes | 91.7% · 11/12 |
|
| 35 |
+
| 2.3 | final calibration | 100% · 12/12 |
|
| 36 |
+
|
| 37 |
+
## What the numbers say (the honest middle)
|
| 38 |
+
|
| 39 |
+
- **The 66.7% plateau is real.** Three different guardians (1.0, 2.0, 2.1) all stop at
|
| 40 |
+
the same ceiling. Guardian 1.0's clone arbiter shares the subject's blind spots.
|
| 41 |
+
- **The plateau is not stagnation — it's churn.** Each step fixes some traps while
|
| 42 |
+
breaking others (the readout shows `+fixed / −broken`); net change is zero across the
|
| 43 |
+
plateau.
|
| 44 |
+
- **Several traps regress before they settle.** Venus (#46) goes
|
| 45 |
+
`correct → wrong → correct → wrong → wrong → correct` across the six stages — the path
|
| 46 |
+
to 100% is not monotonic, and that is shown openly, not smoothed over.
|
| 47 |
+
|
| 48 |
+
Only Guardian 2.2 (verbatim-quote check, namesake relevance gate, soft threshold) breaks
|
| 49 |
+
the ceiling at 91.7%, and Guardian 2.3 (calibration) closes it at 100%. An independent
|
| 50 |
+
Qwen2.5-7B reader/judge with Wikipedia adjudicated the v2 stages.
|
| 51 |
+
|
| 52 |
+
## Data and attribution
|
| 53 |
+
|
| 54 |
+
Subject model **Qwen2.5-3B-Instruct**; arbiters **Qwen2.5-7B-Instruct** (same-family
|
| 55 |
+
clone) and **Wikipedia retrieval + 7B reader/judge** (both Apache-2.0, Alibaba Cloud).
|
| 56 |
+
Wikipedia content © its authors (CC BY-SA). Run on a single RTX 3090. No model weights
|
| 57 |
+
are redistributed here — only aggregate verdicts and counts. Demo code and data: MIT.
|
| 58 |
|
| 59 |
Source code, raw per-stage JSON results, and methodology document:
|
| 60 |
<https://github.com/SergheiBrinza/external-grounding>
|
app.py
DELETED
|
@@ -1,333 +0,0 @@
|
|
| 1 |
-
"""External Grounding demo Space — static visualization of Experiment 2
|
| 2 |
-
and Experiment 3 of the Second Loop project.
|
| 3 |
-
|
| 4 |
-
No model is loaded. No retrieval is performed. All numbers and verdicts
|
| 5 |
-
are exact values from the JSON outputs of the live experimental runs,
|
| 6 |
-
bundled into data.json.
|
| 7 |
-
"""
|
| 8 |
-
|
| 9 |
-
from __future__ import annotations
|
| 10 |
-
|
| 11 |
-
import json
|
| 12 |
-
from pathlib import Path
|
| 13 |
-
|
| 14 |
-
import gradio as gr
|
| 15 |
-
|
| 16 |
-
HERE = Path(__file__).resolve().parent
|
| 17 |
-
DATA = json.loads((HERE / "data.json").read_text())
|
| 18 |
-
META = DATA["meta"]
|
| 19 |
-
STAGES = DATA["stages"]
|
| 20 |
-
TRAPS = DATA["traps"]
|
| 21 |
-
|
| 22 |
-
REPO_URL = META.get("repo", "https://github.com/SergheiBrinza/external-grounding")
|
| 23 |
-
STAGE_KEYS = [s["key"] for s in STAGES]
|
| 24 |
-
STAGE_LABEL = {s["key"]: s["label"] for s in STAGES}
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
# -----------------------------------------------------------------------------
|
| 28 |
-
def stage_pill(verdict: str, key: str) -> str:
|
| 29 |
-
if verdict == "correct":
|
| 30 |
-
cls = "pill pill-green"; text = "✓"
|
| 31 |
-
elif verdict == "incorrect":
|
| 32 |
-
cls = "pill pill-red"; text = "✗"
|
| 33 |
-
else:
|
| 34 |
-
cls = "pill pill-grey"; text = "—"
|
| 35 |
-
return f"<span class='{cls}' title='{STAGE_LABEL[key]}'>{text}<span class='pill-sub'>{key}</span></span>"
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
def render_arc_strip() -> str:
|
| 39 |
-
cells = []
|
| 40 |
-
for s in STAGES:
|
| 41 |
-
cells.append(f"""
|
| 42 |
-
<div class="arc-cell arc-{s['color']}">
|
| 43 |
-
<div class="arc-pct">{s['pct']:.1f}%</div>
|
| 44 |
-
<div class="arc-key">{s['key']}</div>
|
| 45 |
-
<div class="arc-label">{s['label']}</div>
|
| 46 |
-
</div>""")
|
| 47 |
-
return "<div class='arc-strip'>" + "".join(cells) + "</div>"
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
def render_trap_card(t: dict) -> str:
|
| 51 |
-
pills = "".join(stage_pill(t["stages"].get(k, "—"), k) for k in STAGE_KEYS)
|
| 52 |
-
wiki = ""
|
| 53 |
-
if t.get("wiki_titles"):
|
| 54 |
-
items = " · ".join(t["wiki_titles"])
|
| 55 |
-
wiki = f"<div class='row-block'><div class='label'>Wikipedia retrieval (top sources)</div><div class='value value-meta'>{items}</div></div>"
|
| 56 |
-
return f"""
|
| 57 |
-
<div class="trap-card">
|
| 58 |
-
<div class="trap-head">
|
| 59 |
-
<span class="trap-id">#{t['id']:02d}</span>
|
| 60 |
-
<span class="trap-cat">{t['category']}</span>
|
| 61 |
-
<span class="trap-pills">{pills}</span>
|
| 62 |
-
</div>
|
| 63 |
-
<div class="trap-q">{t['question']}</div>
|
| 64 |
-
<div class="row-block">
|
| 65 |
-
<div class="label">Correct</div>
|
| 66 |
-
<div class="value value-correct">{t['correct_answer']}</div>
|
| 67 |
-
</div>
|
| 68 |
-
<div class="row-block">
|
| 69 |
-
<div class="label">Memorized wrong</div>
|
| 70 |
-
<div class="value value-wrong">{t['memorized_wrong']}</div>
|
| 71 |
-
</div>
|
| 72 |
-
<div class="row-block">
|
| 73 |
-
<div class="label">Final answer (Guardian 2.3)</div>
|
| 74 |
-
<div class="value value-correct">{t['final_answer']}</div>
|
| 75 |
-
</div>
|
| 76 |
-
{wiki}
|
| 77 |
-
</div>"""
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
def render_grid(filter_mode: str) -> str:
|
| 81 |
-
rows = []
|
| 82 |
-
for t in TRAPS:
|
| 83 |
-
s = t["stages"]
|
| 84 |
-
if filter_mode == "gk1_failed" and s.get("gk1") != "incorrect":
|
| 85 |
-
continue
|
| 86 |
-
if filter_mode == "fixed_by_wiki" and (s.get("gk1") == "correct" or s.get("gk23") != "correct"):
|
| 87 |
-
continue
|
| 88 |
-
if filter_mode == "needed_full_v23" and (s.get("gk22") == "correct" or s.get("gk23") != "correct"):
|
| 89 |
-
continue
|
| 90 |
-
rows.append(render_trap_card(t))
|
| 91 |
-
if not rows:
|
| 92 |
-
return "<div class='empty'>No traps match the current filter.</div>"
|
| 93 |
-
return "<div class='trap-grid'>" + "\n".join(rows) + "</div>"
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
CSS = """
|
| 97 |
-
@import url('https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@1,700&family=Inter:wght@400;500;700;800&family=JetBrains+Mono:wght@500;700&display=swap');
|
| 98 |
-
|
| 99 |
-
:root {
|
| 100 |
-
--bg: #000000; --bg-card: #0A0A0A; --bg-elev: #141414;
|
| 101 |
-
--border: #1F1F1F; --border-strong: #2A2A2A;
|
| 102 |
-
--text: #FFFFFF; --text-mute: #A8A8A8; --text-dim: #6B6B6B;
|
| 103 |
-
--gold: #D4AF37; --gold-hi: #E8C84A;
|
| 104 |
-
--red: #FF2A2A; --green: #1FD160; --orange: #FF8C42; --yellow: #E8C84A;
|
| 105 |
-
--mono: 'JetBrains Mono', ui-monospace, monospace;
|
| 106 |
-
--serif: 'Playfair Display', serif;
|
| 107 |
-
--sans: 'Inter', system-ui, sans-serif;
|
| 108 |
-
}
|
| 109 |
-
|
| 110 |
-
.gradio-container, body {
|
| 111 |
-
background: var(--bg) !important; color: var(--text) !important;
|
| 112 |
-
font-family: var(--sans) !important;
|
| 113 |
-
max-width: 1180px !important; margin: 0 auto !important;
|
| 114 |
-
padding: 20px 24px 60px 24px !important;
|
| 115 |
-
}
|
| 116 |
-
footer, .built-with, .show-api { display: none !important; }
|
| 117 |
-
.gradio-container > .main > .wrap > .blocks { gap: 18px !important; }
|
| 118 |
-
|
| 119 |
-
.head {
|
| 120 |
-
border: 1px solid var(--border-strong); border-radius: 14px;
|
| 121 |
-
padding: 28px 32px 24px 32px;
|
| 122 |
-
background: linear-gradient(180deg, #0A0A0A 0%, #050505 100%);
|
| 123 |
-
}
|
| 124 |
-
.head-top { display: flex; align-items: flex-start; justify-content: space-between;
|
| 125 |
-
gap: 24px; padding-bottom: 18px; border-bottom: 1px solid var(--border);
|
| 126 |
-
margin-bottom: 16px; }
|
| 127 |
-
.head-brand { display: flex; align-items: center; gap: 16px; }
|
| 128 |
-
.head-icon { width: 56px; height: 56px; border: 1px solid var(--border-strong);
|
| 129 |
-
border-radius: 12px; display: flex; align-items: center; justify-content: center;
|
| 130 |
-
font-size: 28px; background: radial-gradient(60% 60% at 50% 40%, #1A1A1A 0%, #050505 100%); }
|
| 131 |
-
.head-title { font-family: var(--serif); font-style: italic; font-weight: 700;
|
| 132 |
-
font-size: 36px; color: var(--text); line-height: 1; margin: 2px 0 6px 0; }
|
| 133 |
-
.head-subtitle { font-family: var(--mono); font-size: 12px; letter-spacing: 0.16em;
|
| 134 |
-
color: var(--text-mute); text-transform: uppercase; }
|
| 135 |
-
.head-right { text-align: right; }
|
| 136 |
-
.submitted-label { font-family: var(--mono); font-size: 10px; letter-spacing: 0.22em;
|
| 137 |
-
color: var(--text-dim); text-transform: uppercase; display: block; margin-bottom: 4px; }
|
| 138 |
-
.submitted-name { font-family: var(--serif); font-style: italic; font-weight: 700;
|
| 139 |
-
font-size: 22px; color: var(--text); }
|
| 140 |
-
.status-pill { display: inline-flex; align-items: center; gap: 6px; margin-top: 10px;
|
| 141 |
-
padding: 5px 12px; border-radius: 999px;
|
| 142 |
-
background: rgba(31, 209, 96, 0.08); border: 1px solid rgba(31, 209, 96, 0.5);
|
| 143 |
-
font-family: var(--mono); font-size: 10px; letter-spacing: 0.18em;
|
| 144 |
-
color: var(--green); text-transform: uppercase; }
|
| 145 |
-
.status-dot { width: 7px; height: 7px; border-radius: 50%; background: var(--green); }
|
| 146 |
-
|
| 147 |
-
.head-tag { text-align: center; margin: 4px 0 18px 0;
|
| 148 |
-
font-family: var(--mono); font-size: 12px; letter-spacing: 0.22em;
|
| 149 |
-
color: var(--gold); text-transform: uppercase; }
|
| 150 |
-
|
| 151 |
-
.head-meta { display: grid; grid-template-columns: repeat(4, 1fr); gap: 12px 24px; }
|
| 152 |
-
.head-meta .item { display: flex; align-items: baseline; gap: 8px; }
|
| 153 |
-
.head-meta .k { font-family: var(--mono); font-size: 9.5px; letter-spacing: 0.2em;
|
| 154 |
-
color: var(--text-dim); text-transform: uppercase; }
|
| 155 |
-
.head-meta .v { font-family: var(--sans); font-size: 14px; font-weight: 700; color: var(--text); }
|
| 156 |
-
|
| 157 |
-
.cta-wrap { text-align: center; margin: 24px 0 28px 0; }
|
| 158 |
-
.cta { display: inline-flex; align-items: center; gap: 12px;
|
| 159 |
-
padding: 16px 32px; background: var(--red); color: #fff;
|
| 160 |
-
border-radius: 999px; border: 1px solid var(--red);
|
| 161 |
-
font-family: var(--mono); font-size: 13px; letter-spacing: 0.18em;
|
| 162 |
-
text-transform: uppercase; font-weight: 700; text-decoration: none;
|
| 163 |
-
box-shadow: 0 0 30px rgba(255, 53, 53, 0.35);
|
| 164 |
-
transition: transform 120ms ease, box-shadow 120ms ease; }
|
| 165 |
-
.cta:hover { transform: translateY(-1px); box-shadow: 0 0 40px rgba(255, 53, 53, 0.55); }
|
| 166 |
-
.cta-icon { width: 22px; height: 22px; border-radius: 999px; background: #fff;
|
| 167 |
-
color: var(--red); display: flex; align-items: center; justify-content: center;
|
| 168 |
-
font-size: 11px; }
|
| 169 |
-
|
| 170 |
-
/* ARC strip — 6 stages */
|
| 171 |
-
.arc-strip {
|
| 172 |
-
display: grid; grid-template-columns: repeat(6, 1fr); gap: 8px;
|
| 173 |
-
margin-bottom: 22px;
|
| 174 |
-
}
|
| 175 |
-
@media (max-width: 800px) { .arc-strip { grid-template-columns: repeat(3, 1fr); } }
|
| 176 |
-
.arc-cell {
|
| 177 |
-
background: var(--bg-card); border: 1px solid var(--border-strong);
|
| 178 |
-
border-radius: 12px; padding: 18px 12px 14px 12px;
|
| 179 |
-
text-align: center;
|
| 180 |
-
position: relative;
|
| 181 |
-
}
|
| 182 |
-
.arc-red { border-color: rgba(255, 42, 42, 0.45); }
|
| 183 |
-
.arc-orange { border-color: rgba(255, 140, 66, 0.45); }
|
| 184 |
-
.arc-yellow { border-color: rgba(232, 200, 74, 0.45); }
|
| 185 |
-
.arc-green { border-color: rgba(31, 209, 96, 0.45); box-shadow: 0 0 24px rgba(31, 209, 96, 0.15); }
|
| 186 |
-
.arc-pct { font-family: var(--serif); font-style: italic; font-weight: 700; font-size: 28px; color: var(--text); }
|
| 187 |
-
.arc-key { font-family: var(--mono); font-size: 10px; letter-spacing: 0.16em; color: var(--gold); margin-top: 4px; }
|
| 188 |
-
.arc-label { font-family: var(--mono); font-size: 9.5px; letter-spacing: 0.12em; color: var(--text-mute); margin-top: 6px; text-transform: uppercase; line-height: 1.4; }
|
| 189 |
-
|
| 190 |
-
/* TRAP grid */
|
| 191 |
-
.trap-grid { display: grid; grid-template-columns: 1fr; gap: 14px; }
|
| 192 |
-
@media (min-width: 1000px) { .trap-grid { grid-template-columns: 1fr 1fr; } }
|
| 193 |
-
|
| 194 |
-
.trap-card { background: var(--bg-card); border: 1px solid var(--border-strong);
|
| 195 |
-
border-radius: 14px; padding: 22px 24px;
|
| 196 |
-
transition: border-color 120ms ease; }
|
| 197 |
-
.trap-card:hover { border-color: var(--gold); }
|
| 198 |
-
.trap-head { display: flex; align-items: center; gap: 10px; flex-wrap: wrap;
|
| 199 |
-
padding-bottom: 12px; border-bottom: 1px solid var(--border);
|
| 200 |
-
margin-bottom: 14px; }
|
| 201 |
-
.trap-id { font-family: var(--mono); font-size: 11px; letter-spacing: 0.16em;
|
| 202 |
-
color: var(--gold); font-weight: 700; }
|
| 203 |
-
.trap-cat { font-family: var(--mono); font-size: 10px; letter-spacing: 0.16em;
|
| 204 |
-
color: var(--text-dim); text-transform: uppercase;
|
| 205 |
-
padding: 3px 9px; border: 1px solid var(--border-strong); border-radius: 999px; }
|
| 206 |
-
.trap-pills { margin-left: auto; display: flex; gap: 4px; flex-wrap: wrap; }
|
| 207 |
-
.pill { display: inline-flex; align-items: center; padding: 4px 8px; border-radius: 6px;
|
| 208 |
-
font-family: var(--mono); font-size: 11px; font-weight: 700; border: 1px solid; }
|
| 209 |
-
.pill-sub { font-size: 8px; opacity: 0.7; margin-left: 4px; letter-spacing: 0.08em; }
|
| 210 |
-
.pill-green { color: var(--green); border-color: rgba(31, 209, 96, 0.5); background: rgba(31, 209, 96, 0.08); }
|
| 211 |
-
.pill-red { color: var(--red); border-color: rgba(255, 42, 42, 0.5); background: rgba(255, 42, 42, 0.08); }
|
| 212 |
-
.pill-grey { color: var(--text-dim); border-color: var(--border-strong); background: transparent; }
|
| 213 |
-
|
| 214 |
-
.trap-q { font-family: var(--serif); font-style: italic; font-weight: 700;
|
| 215 |
-
font-size: 18px; color: var(--text); margin-bottom: 14px; line-height: 1.35; }
|
| 216 |
-
.row-block { margin-bottom: 11px; }
|
| 217 |
-
.row-block .label { font-family: var(--mono); font-size: 9px; letter-spacing: 0.2em;
|
| 218 |
-
color: var(--text-dim); text-transform: uppercase; margin-bottom: 4px; }
|
| 219 |
-
.row-block .value { font-family: var(--sans); font-size: 14px; line-height: 1.5;
|
| 220 |
-
border-left: 2px solid var(--border-strong);
|
| 221 |
-
padding: 4px 0 4px 12px; }
|
| 222 |
-
.value-correct { border-left-color: var(--green) !important; }
|
| 223 |
-
.value-wrong { border-left-color: var(--red) !important; color: var(--text-mute); }
|
| 224 |
-
.value-meta { border-left-color: var(--gold) !important; color: var(--text-mute); font-family: var(--mono); font-size: 12px; }
|
| 225 |
-
|
| 226 |
-
.empty { text-align: center; padding: 60px; color: var(--text-dim);
|
| 227 |
-
font-family: var(--mono); letter-spacing: 0.18em; font-size: 12px; }
|
| 228 |
-
|
| 229 |
-
.foot { margin-top: 32px; padding: 22px 24px;
|
| 230 |
-
border: 1px solid var(--border); border-radius: 12px; background: var(--bg-card); }
|
| 231 |
-
.foot .title { font-family: var(--mono); font-size: 10px; letter-spacing: 0.2em;
|
| 232 |
-
color: var(--text-dim); text-transform: uppercase; margin-bottom: 10px; }
|
| 233 |
-
.foot .body { font-family: var(--sans); font-size: 13px; color: var(--text-mute); line-height: 1.6; }
|
| 234 |
-
.foot a { color: var(--gold); text-decoration: none; border-bottom: 1px dotted var(--gold); }
|
| 235 |
-
.foot a:hover { color: var(--gold-hi); }
|
| 236 |
-
|
| 237 |
-
.gradio-container .block { background: transparent !important; border: none !important; padding: 0 !important; }
|
| 238 |
-
.gradio-container .form { background: transparent !important; border: none !important; gap: 0 !important; }
|
| 239 |
-
label[data-testid="block-info"] { display: none !important; }
|
| 240 |
-
.gradio-container .gr-radio { background: transparent !important; }
|
| 241 |
-
.gradio-container .gr-radio label {
|
| 242 |
-
background: var(--bg-card) !important; border: 1px solid var(--border-strong) !important;
|
| 243 |
-
border-radius: 999px !important; padding: 7px 16px !important;
|
| 244 |
-
font-family: var(--mono) !important; font-size: 10px !important;
|
| 245 |
-
letter-spacing: 0.16em !important; text-transform: uppercase !important;
|
| 246 |
-
color: var(--text-mute) !important;
|
| 247 |
-
}
|
| 248 |
-
.gradio-container .gr-radio label:hover { border-color: var(--gold) !important; }
|
| 249 |
-
.gradio-container .gr-radio input[type="radio"]:checked + label,
|
| 250 |
-
.gradio-container .gr-radio label.selected {
|
| 251 |
-
background: rgba(255, 42, 42, 0.08) !important; border-color: var(--red) !important; color: var(--text) !important;
|
| 252 |
-
}
|
| 253 |
-
.gradio-container .gr-radio input { display: none !important; }
|
| 254 |
-
"""
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
def header_html() -> str:
|
| 258 |
-
return f"""
|
| 259 |
-
<div class="head">
|
| 260 |
-
<div class="head-top">
|
| 261 |
-
<div class="head-brand">
|
| 262 |
-
<div class="head-icon">🛡️</div>
|
| 263 |
-
<div>
|
| 264 |
-
<div class="head-title">External Grounding</div>
|
| 265 |
-
<div class="head-subtitle">Defending an LLM's correction notebook from polluted memory</div>
|
| 266 |
-
</div>
|
| 267 |
-
</div>
|
| 268 |
-
<div class="head-right">
|
| 269 |
-
<span class="submitted-label">Submitted by</span>
|
| 270 |
-
<div class="submitted-name">{META['author']}</div>
|
| 271 |
-
<div class="status-pill"><span class="status-dot"></span>Static demo</div>
|
| 272 |
-
</div>
|
| 273 |
-
</div>
|
| 274 |
-
<div class="head-tag">★ Second Loop · Part 2 of 3 ★</div>
|
| 275 |
-
<div class="head-meta">
|
| 276 |
-
<div class="item"><span class="k">Subject</span><span class="v">Qwen2.5-3B-Instruct</span></div>
|
| 277 |
-
<div class="item"><span class="k">Arbiter v1</span><span class="v">Qwen2.5-7B (clone)</span></div>
|
| 278 |
-
<div class="item"><span class="k">Arbiter v2</span><span class="v">Wikipedia + 7B</span></div>
|
| 279 |
-
<div class="item"><span class="k">License</span><span class="v">{META['license']}</span></div>
|
| 280 |
-
</div>
|
| 281 |
-
</div>
|
| 282 |
-
|
| 283 |
-
<div class="cta-wrap">
|
| 284 |
-
<a class="cta" href="{REPO_URL}" target="_blank" rel="noopener">
|
| 285 |
-
<span class="cta-icon">▶</span> View the full repository on GitHub
|
| 286 |
-
</a>
|
| 287 |
-
</div>
|
| 288 |
-
|
| 289 |
-
{render_arc_strip()}
|
| 290 |
-
"""
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
def footer_html() -> str:
|
| 294 |
-
return f"""
|
| 295 |
-
<div class="foot">
|
| 296 |
-
<div class="title">About this demo</div>
|
| 297 |
-
<div class="body">
|
| 298 |
-
The arc traces scar-survival on the noisy-notebook benchmark from 50% (no defense)
|
| 299 |
-
to 100% (Guardian 2.3, final calibration). Guardian 1.0 uses a same-family clone
|
| 300 |
-
as arbiter — it shares the subject's blind spots and caps at 66.7%. Guardian 2.x
|
| 301 |
-
replaces the arbiter with live Wikipedia retrieval and three targeted fixes
|
| 302 |
-
(verbatim quote check, namesake relevance gate, soft threshold).
|
| 303 |
-
No live model is loaded in this Space: every answer and percentage on this page
|
| 304 |
-
comes verbatim from the JSON outputs of the original experimental run, bundled
|
| 305 |
-
into <code>data.json</code>.
|
| 306 |
-
<br/><br/>
|
| 307 |
-
Full code, raw per-stage results, and methodology document:
|
| 308 |
-
<a href="{REPO_URL}" target="_blank" rel="noopener">{REPO_URL}</a>.
|
| 309 |
-
</div>
|
| 310 |
-
</div>
|
| 311 |
-
"""
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
def filter_update(mode: str) -> str:
|
| 315 |
-
return render_grid(mode)
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
with gr.Blocks(css=CSS, theme=gr.themes.Base(), title="External Grounding · Second Loop") as demo:
|
| 319 |
-
gr.HTML(header_html())
|
| 320 |
-
filter_radio = gr.Radio(
|
| 321 |
-
choices=["all", "gk1_failed", "fixed_by_wiki", "needed_full_v23"],
|
| 322 |
-
value="all",
|
| 323 |
-
label="", show_label=False, container=False,
|
| 324 |
-
)
|
| 325 |
-
grid_out = gr.HTML(render_grid("all"))
|
| 326 |
-
filter_radio.change(filter_update, filter_radio, grid_out)
|
| 327 |
-
gr.HTML(footer_html())
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
if __name__ == "__main__":
|
| 331 |
-
import os
|
| 332 |
-
port = int(os.environ.get("GRADIO_SERVER_PORT", 7860))
|
| 333 |
-
demo.launch(server_name="0.0.0.0", server_port=port)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
data.json
CHANGED
|
@@ -9,6 +9,14 @@
|
|
| 9 |
"project": "Second Loop — Part 2 of 3",
|
| 10 |
"repo": "https://github.com/SergheiBrinza/external-grounding"
|
| 11 |
},
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
"stages": [
|
| 13 |
{
|
| 14 |
"key": "sick",
|
|
|
|
| 9 |
"project": "Second Loop — Part 2 of 3",
|
| 10 |
"repo": "https://github.com/SergheiBrinza/external-grounding"
|
| 11 |
},
|
| 12 |
+
"showcase": [
|
| 13 |
+
46,
|
| 14 |
+
27,
|
| 15 |
+
16,
|
| 16 |
+
34,
|
| 17 |
+
28,
|
| 18 |
+
20
|
| 19 |
+
],
|
| 20 |
"stages": [
|
| 21 |
{
|
| 22 |
"key": "sick",
|
index.html
ADDED
|
@@ -0,0 +1,599 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!doctype html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="utf-8" />
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
| 6 |
+
<title>External Grounding · Second Loop</title>
|
| 7 |
+
<link rel="preconnect" href="https://fonts.googleapis.com" />
|
| 8 |
+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
|
| 9 |
+
<link href="https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@1,700&family=Inter:wght@400;500;700;800&family=JetBrains+Mono:wght@500;700&display=swap" rel="stylesheet" />
|
| 10 |
+
<script src="https://cdn.plot.ly/plotly-2.35.2.min.js" charset="utf-8"></script>
|
| 11 |
+
<style>
|
| 12 |
+
:root{
|
| 13 |
+
--bg:#000000; --bg-card:#0A0A0A; --bg-elev:#141414;
|
| 14 |
+
--border:#1F1F1F; --border-strong:#2A2A2A;
|
| 15 |
+
--text:#FFFFFF; --text-mute:#A8A8A8; --text-dim:#6B6B6B;
|
| 16 |
+
--gold:#D4AF37; --gold-hi:#E8C84A;
|
| 17 |
+
--red:#FF2A2A; --green:#1FD160; --orange:#FF8C42; --yellow:#E8C84A;
|
| 18 |
+
--mono:'JetBrains Mono',ui-monospace,monospace;
|
| 19 |
+
--serif:'Playfair Display',serif;
|
| 20 |
+
--sans:'Inter',system-ui,sans-serif;
|
| 21 |
+
}
|
| 22 |
+
*{box-sizing:border-box;}
|
| 23 |
+
html,body{margin:0;background:var(--bg);color:var(--text);font-family:var(--sans);}
|
| 24 |
+
.wrap{max-width:1180px;margin:0 auto;padding:22px 24px 64px 24px;}
|
| 25 |
+
a{color:var(--gold);text-decoration:none;border-bottom:1px dotted var(--gold);}
|
| 26 |
+
a:hover{color:var(--gold-hi);}
|
| 27 |
+
|
| 28 |
+
/* ---------- header ---------- */
|
| 29 |
+
.head{border:1px solid var(--border-strong);border-radius:14px;
|
| 30 |
+
padding:26px 30px 22px 30px;background:linear-gradient(180deg,#0A0A0A 0%,#050505 100%);}
|
| 31 |
+
.head-top{display:flex;align-items:flex-start;justify-content:space-between;gap:24px;
|
| 32 |
+
padding-bottom:16px;border-bottom:1px solid var(--border);margin-bottom:16px;}
|
| 33 |
+
.head-brand{display:flex;align-items:center;gap:16px;}
|
| 34 |
+
.head-icon{width:52px;height:52px;border:1px solid var(--border-strong);border-radius:12px;
|
| 35 |
+
display:flex;align-items:center;justify-content:center;
|
| 36 |
+
background:radial-gradient(60% 60% at 50% 40%,#1A1A1A 0%,#050505 100%);}
|
| 37 |
+
.head-title{font-family:var(--serif);font-style:italic;font-weight:700;font-size:34px;
|
| 38 |
+
color:var(--text);line-height:1;margin:2px 0 7px 0;}
|
| 39 |
+
.head-subtitle{font-family:var(--mono);font-size:11.5px;letter-spacing:.14em;
|
| 40 |
+
color:var(--text-mute);text-transform:uppercase;}
|
| 41 |
+
.head-right{text-align:right;white-space:nowrap;}
|
| 42 |
+
.submitted-label{font-family:var(--mono);font-size:10px;letter-spacing:.22em;
|
| 43 |
+
color:var(--text-dim);text-transform:uppercase;display:block;margin-bottom:4px;}
|
| 44 |
+
.submitted-name{font-family:var(--serif);font-style:italic;font-weight:700;font-size:20px;color:var(--text);}
|
| 45 |
+
.status-pill{display:inline-flex;align-items:center;gap:6px;margin-top:10px;padding:5px 12px;
|
| 46 |
+
border-radius:999px;background:rgba(31,209,96,.08);border:1px solid rgba(31,209,96,.5);
|
| 47 |
+
font-family:var(--mono);font-size:10px;letter-spacing:.18em;color:var(--green);text-transform:uppercase;}
|
| 48 |
+
.status-dot{width:7px;height:7px;border-radius:50%;background:var(--green);}
|
| 49 |
+
.head-tag{text-align:center;margin:16px 0 0 0;
|
| 50 |
+
font-family:var(--mono);font-size:11px;letter-spacing:.22em;color:var(--gold);text-transform:uppercase;}
|
| 51 |
+
.head-meta{display:grid;grid-template-columns:repeat(4,1fr);gap:12px 24px;margin-top:16px;}
|
| 52 |
+
.head-meta .item{display:flex;flex-direction:column;gap:3px;}
|
| 53 |
+
.head-meta .k{font-family:var(--mono);font-size:9.5px;letter-spacing:.2em;color:var(--text-dim);text-transform:uppercase;}
|
| 54 |
+
.head-meta .v{font-family:var(--sans);font-size:13.5px;font-weight:700;color:var(--text);}
|
| 55 |
+
|
| 56 |
+
/* ---------- three honest badges ---------- */
|
| 57 |
+
.badges{display:grid;grid-template-columns:repeat(3,1fr);gap:12px;margin:18px 0 4px 0;}
|
| 58 |
+
.badge{background:var(--bg-card);border:1px solid var(--border-strong);border-radius:12px;padding:16px 20px;}
|
| 59 |
+
.badge .label{font-family:var(--mono);font-size:9.5px;letter-spacing:.2em;color:var(--text-dim);
|
| 60 |
+
text-transform:uppercase;margin-bottom:8px;}
|
| 61 |
+
.badge .value{font-family:var(--serif);font-style:italic;font-weight:700;font-size:26px;color:var(--text);}
|
| 62 |
+
.badge .sub{font-family:var(--mono);font-size:10px;letter-spacing:.1em;color:var(--text-mute);
|
| 63 |
+
text-transform:uppercase;margin-top:4px;}
|
| 64 |
+
.badge.red{border-color:rgba(255,42,42,.5);} .badge.red .value{color:var(--red);}
|
| 65 |
+
.badge.orange{border-color:rgba(255,140,66,.5);} .badge.orange .value{color:var(--orange);}
|
| 66 |
+
.badge.green{border-color:rgba(31,209,96,.5);} .badge.green .value{color:var(--green);}
|
| 67 |
+
|
| 68 |
+
/* ---------- exhibit (the museum lever) ---------- */
|
| 69 |
+
.exhibit{margin-top:18px;border:1px solid var(--border-strong);border-radius:14px;
|
| 70 |
+
background:var(--bg-card);padding:22px 24px 26px 24px;}
|
| 71 |
+
.exhibit-title{font-family:var(--serif);font-style:italic;font-weight:700;font-size:22px;margin:0 0 2px 0;}
|
| 72 |
+
.exhibit-lede{font-family:var(--mono);font-size:11px;letter-spacing:.1em;color:var(--text-mute);
|
| 73 |
+
text-transform:uppercase;margin-bottom:18px;}
|
| 74 |
+
|
| 75 |
+
.exhibit-grid{display:grid;grid-template-columns:1fr 320px;gap:22px;align-items:stretch;}
|
| 76 |
+
#chart{width:100%;height:430px;}
|
| 77 |
+
.readout{border:1px solid var(--border-strong);border-radius:12px;background:#050505;
|
| 78 |
+
padding:20px 22px;display:flex;flex-direction:column;gap:2px;transition:border-color 160ms ease;}
|
| 79 |
+
.readout .ro-label{font-family:var(--mono);font-size:9.5px;letter-spacing:.2em;color:var(--text-dim);text-transform:uppercase;}
|
| 80 |
+
.readout .ro-name{font-family:var(--serif);font-style:italic;font-weight:700;font-size:25px;line-height:1.08;margin:4px 0 2px 0;}
|
| 81 |
+
.readout .ro-stageno{font-family:var(--mono);font-size:10.5px;letter-spacing:.08em;color:var(--text-mute);text-transform:uppercase;}
|
| 82 |
+
.ro-pct-label{font-family:var(--mono);font-size:9.5px;letter-spacing:.2em;color:var(--text-dim);
|
| 83 |
+
text-transform:uppercase;margin-top:20px;}
|
| 84 |
+
.ro-pct{font-family:var(--serif);font-style:italic;font-weight:700;font-size:56px;line-height:1;color:var(--green);transition:color 160ms ease;}
|
| 85 |
+
.ro-count{font-family:var(--mono);font-size:12px;letter-spacing:.06em;color:var(--text-mute);margin-top:2px;}
|
| 86 |
+
.ro-tag{align-self:flex-start;margin-top:10px;padding:4px 10px;border-radius:999px;
|
| 87 |
+
font-family:var(--mono);font-size:9.5px;letter-spacing:.16em;text-transform:uppercase;
|
| 88 |
+
border:1px solid var(--border-strong);color:var(--text-mute);}
|
| 89 |
+
.ro-tag.red{background:rgba(255,42,42,.1);border-color:rgba(255,42,42,.55);color:var(--red);}
|
| 90 |
+
.ro-tag.orange{background:rgba(255,140,66,.1);border-color:rgba(255,140,66,.55);color:var(--orange);}
|
| 91 |
+
.ro-tag.green{background:rgba(31,209,96,.1);border-color:rgba(31,209,96,.55);color:var(--green);}
|
| 92 |
+
.ro-aux{margin-top:18px;border-top:1px solid var(--border);padding-top:14px;display:flex;flex-direction:column;gap:9px;}
|
| 93 |
+
.ro-aux .row{display:flex;align-items:baseline;justify-content:space-between;gap:10px;}
|
| 94 |
+
.ro-aux .rk{font-family:var(--mono);font-size:10px;letter-spacing:.06em;color:var(--text-dim);text-transform:uppercase;}
|
| 95 |
+
.ro-aux .rv{font-family:var(--mono);font-size:13px;font-weight:700;}
|
| 96 |
+
.rv .fx{color:var(--green);} .rv .bk{color:var(--red);} .rv.mute{color:var(--text-mute);}
|
| 97 |
+
|
| 98 |
+
/* ---------- slider (the lever) ---------- */
|
| 99 |
+
.lever{margin-top:24px;}
|
| 100 |
+
.lever-head{display:flex;align-items:baseline;justify-content:space-between;margin-bottom:14px;}
|
| 101 |
+
.lever-head .t{font-family:var(--mono);font-size:11px;letter-spacing:.18em;color:var(--text);text-transform:uppercase;}
|
| 102 |
+
.lever-head .h{font-family:var(--mono);font-size:10px;letter-spacing:.1em;color:var(--text-dim);text-transform:uppercase;}
|
| 103 |
+
.slider-area{position:relative;padding:0 4px;}
|
| 104 |
+
input[type=range].lever-input{
|
| 105 |
+
-webkit-appearance:none;appearance:none;width:100%;height:6px;border-radius:999px;margin:0;
|
| 106 |
+
background:linear-gradient(90deg,var(--red) 0%,var(--orange) 45%,var(--yellow) 80%,var(--green) 100%);
|
| 107 |
+
outline:none;cursor:pointer;}
|
| 108 |
+
input[type=range].lever-input::-webkit-slider-thumb{
|
| 109 |
+
-webkit-appearance:none;appearance:none;width:26px;height:26px;border-radius:50%;
|
| 110 |
+
background:#0A0A0A;border:2px solid var(--gold);box-shadow:0 0 0 4px rgba(212,175,55,.18);
|
| 111 |
+
cursor:grab;margin-top:-10px;transition:border-color 120ms ease,box-shadow 120ms ease;}
|
| 112 |
+
input[type=range].lever-input::-webkit-slider-thumb:active{cursor:grabbing;}
|
| 113 |
+
input[type=range].lever-input::-moz-range-thumb{width:26px;height:26px;border-radius:50%;
|
| 114 |
+
background:#0A0A0A;border:2px solid var(--gold);box-shadow:0 0 0 4px rgba(212,175,55,.18);cursor:grab;}
|
| 115 |
+
input[type=range].lever-input::-moz-range-track{height:6px;border-radius:999px;background:transparent;}
|
| 116 |
+
input[type=range].lever-input:focus-visible::-webkit-slider-thumb{box-shadow:0 0 0 4px rgba(212,175,55,.20),0 0 0 7px var(--gold-hi);}
|
| 117 |
+
input[type=range].lever-input:focus-visible::-moz-range-thumb{box-shadow:0 0 0 4px rgba(212,175,55,.20),0 0 0 7px var(--gold-hi);}
|
| 118 |
+
.ticks{position:relative;height:10px;margin-top:9px;}
|
| 119 |
+
.tick{position:absolute;top:0;width:1px;height:6px;background:var(--border-strong);transform:translateX(-50%);}
|
| 120 |
+
.tick.on{background:var(--gold);height:9px;}
|
| 121 |
+
.ticklabels{position:relative;height:32px;margin-top:2px;}
|
| 122 |
+
.tlabel{position:absolute;top:0;font-family:var(--mono);font-size:9.5px;letter-spacing:.02em;
|
| 123 |
+
color:var(--text-dim);white-space:nowrap;line-height:1.2;text-align:center;cursor:pointer;
|
| 124 |
+
transition:color 120ms ease;}
|
| 125 |
+
.tlabel .pct{display:block;font-size:8.5px;color:var(--text-dim);}
|
| 126 |
+
.tlabel.on{color:var(--text);font-weight:700;}
|
| 127 |
+
.tlabel.on .pct{color:var(--gold);}
|
| 128 |
+
.lever-foot{margin-top:18px;display:flex;justify-content:space-between;
|
| 129 |
+
font-family:var(--mono);font-size:10px;letter-spacing:.12em;color:var(--text-dim);text-transform:uppercase;}
|
| 130 |
+
|
| 131 |
+
/* ---------- trap grid ---------- */
|
| 132 |
+
.grid-wrap{margin-top:22px;border:1px solid var(--border-strong);border-radius:14px;background:var(--bg-card);padding:20px 22px;}
|
| 133 |
+
.grid-cap{display:flex;align-items:baseline;justify-content:space-between;gap:12px;margin-bottom:14px;flex-wrap:wrap;}
|
| 134 |
+
.grid-cap .gc-t{font-family:var(--mono);font-size:10px;letter-spacing:.16em;color:var(--text-dim);text-transform:uppercase;}
|
| 135 |
+
.grid-cap .gc-n{font-family:var(--mono);font-size:11px;letter-spacing:.08em;color:var(--text-mute);text-transform:uppercase;}
|
| 136 |
+
.grid-cap .gc-n b{color:var(--green);}
|
| 137 |
+
.chips{display:flex;flex-wrap:wrap;gap:8px;margin-bottom:14px;}
|
| 138 |
+
.chip{cursor:pointer;border:1px solid var(--border-strong);background:#050505;border-radius:999px;
|
| 139 |
+
padding:6px 13px;font-family:var(--mono);font-size:10px;letter-spacing:.08em;text-transform:uppercase;
|
| 140 |
+
color:var(--text-mute);transition:border-color 120ms ease,color 120ms ease;}
|
| 141 |
+
.chip:hover{border-color:var(--gold);color:var(--text);}
|
| 142 |
+
.chip.active{border-color:var(--gold);color:var(--text);background:rgba(212,175,55,.07);}
|
| 143 |
+
.grid12{display:grid;grid-template-columns:1fr;gap:10px;}
|
| 144 |
+
@media(min-width:720px){.grid12{grid-template-columns:1fr 1fr;}}
|
| 145 |
+
@media(min-width:1020px){.grid12{grid-template-columns:1fr 1fr 1fr;}}
|
| 146 |
+
.gcard{cursor:pointer;border:1px solid var(--border-strong);border-radius:11px;background:#080808;
|
| 147 |
+
padding:13px 15px;border-left-width:3px;transition:border-color 140ms ease,background 140ms ease;}
|
| 148 |
+
.gcard:hover{border-color:var(--gold);}
|
| 149 |
+
.gcard.sel{background:rgba(212,175,55,.06);}
|
| 150 |
+
.gcard.now-correct{border-left-color:var(--green);}
|
| 151 |
+
.gcard.now-wrong{border-left-color:var(--red);}
|
| 152 |
+
.gc-head{display:flex;align-items:center;gap:8px;margin-bottom:8px;}
|
| 153 |
+
.gc-id{font-family:var(--mono);font-size:10px;font-weight:700;color:var(--gold);letter-spacing:.08em;}
|
| 154 |
+
.gc-cat{font-family:var(--mono);font-size:8.5px;letter-spacing:.1em;color:var(--text-dim);text-transform:uppercase;
|
| 155 |
+
padding:2px 7px;border:1px solid var(--border-strong);border-radius:999px;}
|
| 156 |
+
.gc-now{margin-left:auto;font-family:var(--mono);font-size:8.5px;letter-spacing:.1em;font-weight:700;text-transform:uppercase;}
|
| 157 |
+
.gc-now.c{color:var(--green);} .gc-now.w{color:var(--red);}
|
| 158 |
+
.gc-q{font-family:var(--sans);font-size:12.5px;line-height:1.45;color:var(--text-mute);margin-bottom:10px;min-height:36px;}
|
| 159 |
+
.traj{display:flex;gap:5px;align-items:center;}
|
| 160 |
+
.tdot{width:13px;height:13px;border-radius:50%;border:1px solid var(--border-strong);position:relative;}
|
| 161 |
+
.tdot.c{background:rgba(31,209,96,.85);border-color:var(--green);}
|
| 162 |
+
.tdot.w{background:rgba(255,42,42,.7);border-color:var(--red);}
|
| 163 |
+
.tdot.cur{box-shadow:0 0 0 2px var(--gold);}
|
| 164 |
+
.traj-axis{display:flex;gap:5px;margin-top:5px;}
|
| 165 |
+
.traj-axis span{width:13px;font-family:var(--mono);font-size:7px;color:var(--text-dim);text-align:center;letter-spacing:0;}
|
| 166 |
+
|
| 167 |
+
/* ---------- spotlight ---------- */
|
| 168 |
+
.spot{margin-top:22px;border:1px solid var(--border-strong);border-radius:14px;background:var(--bg-card);padding:22px 24px;}
|
| 169 |
+
.spot-top{display:flex;align-items:baseline;gap:10px;flex-wrap:wrap;margin-bottom:6px;}
|
| 170 |
+
.spot-id{font-family:var(--mono);font-size:11px;font-weight:700;color:var(--gold);letter-spacing:.1em;}
|
| 171 |
+
.spot-cat{font-family:var(--mono);font-size:9px;letter-spacing:.14em;color:var(--text-dim);text-transform:uppercase;
|
| 172 |
+
padding:2px 8px;border:1px solid var(--border-strong);border-radius:999px;}
|
| 173 |
+
.spot-q{font-family:var(--serif);font-style:italic;font-weight:700;font-size:21px;line-height:1.32;margin:2px 0 16px 0;}
|
| 174 |
+
.spot-cols{display:grid;grid-template-columns:1fr 1fr;gap:16px 28px;}
|
| 175 |
+
@media(max-width:760px){.spot-cols{grid-template-columns:1fr;}}
|
| 176 |
+
.row-block{margin-bottom:12px;}
|
| 177 |
+
.row-block .label{font-family:var(--mono);font-size:9px;letter-spacing:.2em;color:var(--text-dim);text-transform:uppercase;margin-bottom:4px;}
|
| 178 |
+
.row-block .value{font-family:var(--sans);font-size:13.5px;line-height:1.5;border-left:2px solid var(--border-strong);padding:4px 0 4px 12px;}
|
| 179 |
+
.value.correct{border-left-color:var(--green);}
|
| 180 |
+
.value.wrong{border-left-color:var(--red);color:var(--text-mute);}
|
| 181 |
+
.value.final{border-left-color:var(--green);}
|
| 182 |
+
.value.meta{border-left-color:var(--gold);color:var(--text-mute);font-family:var(--mono);font-size:12px;}
|
| 183 |
+
/* trajectory strip in spotlight */
|
| 184 |
+
.spot-traj{margin-top:6px;}
|
| 185 |
+
.spot-traj .label{font-family:var(--mono);font-size:9px;letter-spacing:.2em;color:var(--text-dim);text-transform:uppercase;margin-bottom:10px;}
|
| 186 |
+
.straj{display:grid;grid-template-columns:repeat(6,1fr);gap:6px;}
|
| 187 |
+
.scell{text-align:center;border:1px solid var(--border-strong);border-radius:9px;padding:9px 4px 7px 4px;background:#050505;}
|
| 188 |
+
.scell.c{border-color:rgba(31,209,96,.45);} .scell.w{border-color:rgba(255,42,42,.45);}
|
| 189 |
+
.scell.cur{box-shadow:0 0 0 2px var(--gold);}
|
| 190 |
+
.scell .mk{font-family:var(--mono);font-size:15px;font-weight:700;line-height:1;}
|
| 191 |
+
.scell.c .mk{color:var(--green);} .scell.w .mk{color:var(--red);}
|
| 192 |
+
.scell .sn{font-family:var(--mono);font-size:8px;letter-spacing:.04em;color:var(--text-dim);margin-top:5px;text-transform:uppercase;}
|
| 193 |
+
.scell .sp{font-family:var(--mono);font-size:8px;color:var(--text-dim);margin-top:2px;}
|
| 194 |
+
.trust-row{display:flex;gap:18px;margin-top:8px;}
|
| 195 |
+
.trust-row .ti{font-family:var(--mono);font-size:10px;letter-spacing:.06em;color:var(--text-dim);text-transform:uppercase;}
|
| 196 |
+
.trust-row .ti b{color:var(--text-mute);}
|
| 197 |
+
|
| 198 |
+
/* ---------- about ---------- */
|
| 199 |
+
.foot{margin-top:24px;padding:22px 24px;border:1px solid var(--border);border-radius:12px;background:var(--bg-card);}
|
| 200 |
+
.foot .ftitle{font-family:var(--mono);font-size:10px;letter-spacing:.2em;color:var(--text-dim);text-transform:uppercase;margin-bottom:10px;}
|
| 201 |
+
.foot .body{font-family:var(--sans);font-size:13px;color:var(--text-mute);line-height:1.62;}
|
| 202 |
+
.foot code{font-family:var(--mono);font-size:12px;color:var(--text);background:var(--bg-elev);padding:1px 5px;border-radius:4px;border:1px solid var(--border);}
|
| 203 |
+
.attrib{margin-top:14px;padding-top:14px;border-top:1px solid var(--border);
|
| 204 |
+
font-family:var(--mono);font-size:11px;letter-spacing:.04em;color:var(--text-dim);line-height:1.7;}
|
| 205 |
+
|
| 206 |
+
.databanner{border:1px solid rgba(255,42,42,.5);background:rgba(255,42,42,.06);color:var(--red);
|
| 207 |
+
font-family:var(--mono);font-size:12px;letter-spacing:.04em;padding:14px 16px;border-radius:10px;margin:4px 0 16px 0;line-height:1.5;}
|
| 208 |
+
|
| 209 |
+
@media(max-width:860px){
|
| 210 |
+
.exhibit-grid{grid-template-columns:1fr;}
|
| 211 |
+
.head-meta,.badges{grid-template-columns:repeat(2,1fr);}
|
| 212 |
+
.head-top{flex-direction:column;} .head-right{text-align:left;}
|
| 213 |
+
}
|
| 214 |
+
@media(prefers-reduced-motion:reduce){ *{transition:none!important;animation:none!important;} }
|
| 215 |
+
</style>
|
| 216 |
+
</head>
|
| 217 |
+
<body>
|
| 218 |
+
<main class="wrap">
|
| 219 |
+
|
| 220 |
+
<!-- HEADER -->
|
| 221 |
+
<header class="head">
|
| 222 |
+
<div class="head-top">
|
| 223 |
+
<div class="head-brand">
|
| 224 |
+
<div class="head-icon">
|
| 225 |
+
<svg width="30" height="30" viewBox="0 0 30 30" fill="none" aria-hidden="true">
|
| 226 |
+
<path d="M15 3.5 L24.5 7 V14 C24.5 20.5 20.3 24.5 15 26.5 C9.7 24.5 5.5 20.5 5.5 14 V7 Z"
|
| 227 |
+
stroke="#2A2A2A" stroke-width="1.3" fill="none"/>
|
| 228 |
+
<polyline points="9.5,18 12.5,15.5 16,12.5 20.5,9.5" stroke="#1FD160" stroke-width="1.7" fill="none"/>
|
| 229 |
+
<circle cx="20.5" cy="9.5" r="1.5" fill="#D4AF37"/>
|
| 230 |
+
</svg>
|
| 231 |
+
</div>
|
| 232 |
+
<div>
|
| 233 |
+
<div class="head-title">External Grounding</div>
|
| 234 |
+
<div class="head-subtitle">Raising self-correction from 50% to 100% under a noisy notebook</div>
|
| 235 |
+
</div>
|
| 236 |
+
</div>
|
| 237 |
+
<div class="head-right">
|
| 238 |
+
<span class="submitted-label">Submitted by</span>
|
| 239 |
+
<div class="submitted-name" id="author">Serghei Brinza</div>
|
| 240 |
+
<div class="status-pill"><span class="status-dot"></span>Static demo · no live model</div>
|
| 241 |
+
</div>
|
| 242 |
+
</div>
|
| 243 |
+
<div class="head-tag">★ Second Loop · Part 2 of 3 ★</div>
|
| 244 |
+
<div class="head-meta">
|
| 245 |
+
<div class="item"><span class="k">Subject model</span><span class="v">Qwen2.5-3B-Instruct (frozen)</span></div>
|
| 246 |
+
<div class="item"><span class="k">Arbiter v1</span><span class="v">Qwen2.5-7B (same-family clone)</span></div>
|
| 247 |
+
<div class="item"><span class="k">Arbiter v2</span><span class="v">Wikipedia + 7B reader</span></div>
|
| 248 |
+
<div class="item"><span class="k">License</span><span class="v" id="m-license">MIT</span></div>
|
| 249 |
+
</div>
|
| 250 |
+
</header>
|
| 251 |
+
|
| 252 |
+
<!-- THREE HONEST BADGES -->
|
| 253 |
+
<section class="badges">
|
| 254 |
+
<div class="badge red"><div class="label">No defense — raw 3B</div><div class="value" id="b-sick">50.0%</div><div class="sub" id="b-sick-sub">6 / 12 on the noisy notebook</div></div>
|
| 255 |
+
<div class="badge orange"><div class="label">Clone-arbiter ceiling</div><div class="value" id="b-ceil">66.7%</div><div class="sub">three versions stuck here</div></div>
|
| 256 |
+
<div class="badge green"><div class="label">Calibrated — Guardian 2.3</div><div class="value" id="b-final">100%</div><div class="sub" id="b-final-sub">12 / 12, external grounding</div></div>
|
| 257 |
+
</section>
|
| 258 |
+
|
| 259 |
+
<!-- EXHIBIT: the museum lever -->
|
| 260 |
+
<section class="exhibit">
|
| 261 |
+
<div class="exhibit-title">Drag the guardian from sick to calibrated.</div>
|
| 262 |
+
<div class="exhibit-lede">Move the lever through six guardian versions and watch corrected answers climb 50% → 100%</div>
|
| 263 |
+
|
| 264 |
+
<div class="exhibit-grid">
|
| 265 |
+
<div id="chart"></div>
|
| 266 |
+
<div class="readout" id="readout">
|
| 267 |
+
<div class="ro-label">Current guardian stage</div>
|
| 268 |
+
<div class="ro-name" id="ro-name">Sick (no defense)</div>
|
| 269 |
+
<div class="ro-stageno" id="ro-stageno">stage 1 of 6</div>
|
| 270 |
+
<div class="ro-pct-label">Corrected answers · higher is better</div>
|
| 271 |
+
<div class="ro-pct" id="ro-pct">50.0%</div>
|
| 272 |
+
<div class="ro-count" id="ro-count">6 / 12 traps correct</div>
|
| 273 |
+
<div class="ro-tag" id="ro-tag">No external grounding</div>
|
| 274 |
+
<div class="ro-aux">
|
| 275 |
+
<div class="row"><span class="rk">vs previous stage</span><span class="rv mute" id="ro-churn">start</span></div>
|
| 276 |
+
<div class="row"><span class="rk">net change</span><span class="rv mute" id="ro-net">—</span></div>
|
| 277 |
+
</div>
|
| 278 |
+
</div>
|
| 279 |
+
</div>
|
| 280 |
+
|
| 281 |
+
<div class="lever">
|
| 282 |
+
<div class="lever-head">
|
| 283 |
+
<span class="t">The lever — guardian version</span>
|
| 284 |
+
<span class="h">← weaker · stronger →</span>
|
| 285 |
+
</div>
|
| 286 |
+
<div class="slider-area">
|
| 287 |
+
<input id="slider" class="lever-input" type="range" min="0" max="5" step="1" value="0"
|
| 288 |
+
aria-label="Guardian version" />
|
| 289 |
+
<div class="ticks" id="ticks"></div>
|
| 290 |
+
<div class="ticklabels" id="ticklabels"></div>
|
| 291 |
+
</div>
|
| 292 |
+
<div class="lever-foot">
|
| 293 |
+
<span>No defense</span>
|
| 294 |
+
<span>66.7% plateau</span>
|
| 295 |
+
<span>Calibrated</span>
|
| 296 |
+
</div>
|
| 297 |
+
</div>
|
| 298 |
+
</section>
|
| 299 |
+
|
| 300 |
+
<!-- TRAP GRID -->
|
| 301 |
+
<section class="grid-wrap">
|
| 302 |
+
<div class="grid-cap">
|
| 303 |
+
<span class="gc-t">All 12 traps — colour = verdict at the current stage · dots = full trajectory · click to inspect</span>
|
| 304 |
+
<span class="gc-n" id="grid-n">Stage: <b id="grid-n-v">6 / 12 correct</b></span>
|
| 305 |
+
</div>
|
| 306 |
+
<div class="chips" id="chips"></div>
|
| 307 |
+
<div class="grid12" id="grid12"></div>
|
| 308 |
+
</section>
|
| 309 |
+
|
| 310 |
+
<!-- SPOTLIGHT -->
|
| 311 |
+
<section class="spot" id="spot">
|
| 312 |
+
<div class="spot-top">
|
| 313 |
+
<span class="spot-id" id="sp-id">#46</span>
|
| 314 |
+
<span class="spot-cat" id="sp-cat">science-number</span>
|
| 315 |
+
</div>
|
| 316 |
+
<div class="spot-q" id="sp-q">—</div>
|
| 317 |
+
<div class="spot-cols">
|
| 318 |
+
<div>
|
| 319 |
+
<div class="row-block"><div class="label">Correct answer</div><div class="value correct" id="sp-correct">—</div></div>
|
| 320 |
+
<div class="row-block"><div class="label">Memorized wrong answer</div><div class="value wrong" id="sp-wrong">—</div></div>
|
| 321 |
+
<div class="row-block"><div class="label">Final answer · Guardian 2.3</div><div class="value final" id="sp-final">—</div></div>
|
| 322 |
+
<div class="row-block"><div class="label">Wikipedia retrieval — top sources</div><div class="value meta" id="sp-wiki">—</div></div>
|
| 323 |
+
<div class="trust-row">
|
| 324 |
+
<span class="ti">trust v2.2 <b id="sp-t22">—</b></span>
|
| 325 |
+
<span class="ti">trust v2.3 <b id="sp-t23">—</b></span>
|
| 326 |
+
</div>
|
| 327 |
+
</div>
|
| 328 |
+
<div class="spot-traj">
|
| 329 |
+
<div class="label">Verdict across the six guardian stages</div>
|
| 330 |
+
<div class="straj" id="sp-traj"></div>
|
| 331 |
+
</div>
|
| 332 |
+
</div>
|
| 333 |
+
</section>
|
| 334 |
+
|
| 335 |
+
<!-- ABOUT -->
|
| 336 |
+
<footer class="foot">
|
| 337 |
+
<div class="ftitle">About this demo</div>
|
| 338 |
+
<div class="body">
|
| 339 |
+
Twelve questions where a frozen <b>Qwen2.5-3B-Instruct</b> has a confidently memorized
|
| 340 |
+
<i>wrong</i> answer are run through a correction notebook whose external entries are
|
| 341 |
+
<b>noisy</b> — some verified facts, some unreliable look-alikes. The lever steps through six
|
| 342 |
+
guardian versions, each deciding what the notebook is allowed to absorb, and the score is the
|
| 343 |
+
share of the twelve answered correctly. With no guardian the model sits at
|
| 344 |
+
<b id="t-sick">50%</b> (6 / 12).
|
| 345 |
+
<br/><br/>
|
| 346 |
+
The honest part is the middle. Guardian 1.0 uses a <b>same-family clone</b> as arbiter — it
|
| 347 |
+
shares the subject's blind spots, so it caps at <b id="t-ceil">66.7%</b>. Guardian 2.0
|
| 348 |
+
(live Wikipedia retrieval) and Guardian 2.1 (more retrieval) <i>also</i> land on 66.7%:
|
| 349 |
+
three different attempts, one ceiling. And that plateau is not stagnation — under the hood each
|
| 350 |
+
step fixes some traps while breaking others (move the lever and read “+fixed / −broken”).
|
| 351 |
+
Only Guardian 2.2 (three targeted fixes — verbatim-quote check, namesake relevance gate, soft
|
| 352 |
+
threshold) reaches <b id="t-22">91.7%</b>, and Guardian 2.3 (final calibration) reaches
|
| 353 |
+
<b id="t-23">100%</b>. Several traps regress along the way — Venus (#46) is correct, broken,
|
| 354 |
+
fixed, broken twice more, then finally held: <code>C → X → C → X → X → C</code>.
|
| 355 |
+
<br/><br/>
|
| 356 |
+
No live model runs in this Space. Every verdict, percentage and retrieval source is verbatim
|
| 357 |
+
from the original experimental run, bundled into <code>data.json</code>. An independent
|
| 358 |
+
Qwen2.5-7B reader/judge with Wikipedia adjudicated the v2 stages. Full code, raw per-stage
|
| 359 |
+
results and methodology:
|
| 360 |
+
<a id="repo-link" href="https://github.com/SergheiBrinza/external-grounding" target="_blank" rel="noopener">github.com/SergheiBrinza/external-grounding</a>.
|
| 361 |
+
</div>
|
| 362 |
+
<div class="attrib">
|
| 363 |
+
Subject model Qwen2.5-3B-Instruct · arbiters Qwen2.5-7B-Instruct (same-family clone) and
|
| 364 |
+
Wikipedia retrieval + 7B reader/judge (both Apache-2.0, Alibaba Cloud). Wikipedia content
|
| 365 |
+
© its authors, CC BY-SA. Run on a single RTX 3090. No model weights are redistributed here —
|
| 366 |
+
only aggregate verdicts and counts. Demo code & data: MIT.
|
| 367 |
+
</div>
|
| 368 |
+
</footer>
|
| 369 |
+
|
| 370 |
+
</main>
|
| 371 |
+
|
| 372 |
+
<script>
|
| 373 |
+
const COL={card:'#0A0A0A',border:'#1F1F1F',borderS:'#2A2A2A',text:'#FFFFFF',mute:'#A8A8A8',dim:'#6B6B6B',
|
| 374 |
+
gold:'#D4AF37',red:'#FF2A2A',green:'#1FD160',orange:'#FF8C42',yellow:'#E8C84A'};
|
| 375 |
+
const COLOR={red:COL.red,orange:COL.orange,yellow:COL.yellow,green:COL.green};
|
| 376 |
+
const $=id=>document.getElementById(id);
|
| 377 |
+
const f1=v=>v.toFixed(1);
|
| 378 |
+
|
| 379 |
+
let META={}, STAGES=[], TRAPS=[], BYID={}, SHOWCASE=[], COUNTS=[], cur=0, selTrap=46;
|
| 380 |
+
const SHORT={sick:'sick',gk1:'1.0',gk2:'2.0',gk21:'2.1',gk22:'2.2',gk23:'2.3'};
|
| 381 |
+
const PLOT_MARGIN={l:40,r:16,t:16,b:34};
|
| 382 |
+
|
| 383 |
+
fetch('data.json').then(r=>r.json()).then(D=>{
|
| 384 |
+
META=D.meta||{}; STAGES=D.stages||[]; TRAPS=D.traps||[];
|
| 385 |
+
TRAPS.forEach(t=>BYID[t.id]=t);
|
| 386 |
+
SHOWCASE=(D.showcase&&D.showcase.length)?D.showcase:TRAPS.slice(0,6).map(t=>t.id);
|
| 387 |
+
COUNTS=STAGES.map(s=>TRAPS.filter(t=>t.stages[s.key]==='correct').length);
|
| 388 |
+
|
| 389 |
+
if(META.author) $('author').textContent=META.author;
|
| 390 |
+
if(META.license) $('m-license').textContent=String(META.license).toUpperCase();
|
| 391 |
+
const n=TRAPS.length;
|
| 392 |
+
$('b-sick').textContent=f1(STAGES[0].pct)+'%'; $('b-sick-sub').textContent=COUNTS[0]+' / '+n+' on the noisy notebook';
|
| 393 |
+
$('b-ceil').textContent=f1(STAGES[1].pct)+'%';
|
| 394 |
+
$('b-final').textContent=(STAGES[5].pct%1===0?STAGES[5].pct.toFixed(0):f1(STAGES[5].pct))+'%';
|
| 395 |
+
$('b-final-sub').textContent=COUNTS[5]+' / '+n+', external grounding';
|
| 396 |
+
$('t-sick').textContent=f1(STAGES[0].pct)+'%';
|
| 397 |
+
$('t-ceil').textContent=f1(STAGES[1].pct)+'%';
|
| 398 |
+
$('t-22').textContent=f1(STAGES[4].pct)+'%';
|
| 399 |
+
$('t-23').textContent=(STAGES[5].pct%1===0?STAGES[5].pct.toFixed(0):f1(STAGES[5].pct))+'%';
|
| 400 |
+
if(META.repo){const rl=$('repo-link');rl.href=META.repo;rl.textContent=META.repo.replace(/^https?:\/\//,'');}
|
| 401 |
+
|
| 402 |
+
buildTicks(); buildChips(); buildGrid(); drawChart();
|
| 403 |
+
|
| 404 |
+
// deep-link: ?stage=0..5 & trap=<id>
|
| 405 |
+
const q=new URLSearchParams(location.search);
|
| 406 |
+
const st=parseInt(q.get('stage'),10); const start=(Number.isFinite(st)&&st>=0&&st<=5)?st:0;
|
| 407 |
+
const tp=parseInt(q.get('trap'),10); if(BYID[tp]) selTrap=tp; else if(!BYID[selTrap]) selTrap=SHOWCASE[0];
|
| 408 |
+
$('slider').value=start;
|
| 409 |
+
$('slider').addEventListener('input',e=>setStage(+e.target.value));
|
| 410 |
+
renderSpot(); setStage(start);
|
| 411 |
+
}).catch(err=>{
|
| 412 |
+
console.error('external-grounding: could not load data.json —',err);
|
| 413 |
+
const ex=document.querySelector('.exhibit');
|
| 414 |
+
if(ex){const b=document.createElement('div');b.className='databanner';
|
| 415 |
+
b.textContent='Data unavailable — could not load data.json ('+err+'). This static demo needs data.json served alongside the page.';
|
| 416 |
+
ex.insertBefore(b,ex.firstChild);}
|
| 417 |
+
});
|
| 418 |
+
|
| 419 |
+
function correctAt(key){return TRAPS.filter(t=>t.stages[key]==='correct').length;}
|
| 420 |
+
function churn(i){
|
| 421 |
+
if(i<=0) return null;
|
| 422 |
+
const a=STAGES[i-1].key, b=STAGES[i].key; let fixed=0,broken=0;
|
| 423 |
+
TRAPS.forEach(t=>{const pa=t.stages[a]==='correct',pb=t.stages[b]==='correct';
|
| 424 |
+
if(!pa&&pb)fixed++; if(pa&&!pb)broken++;});
|
| 425 |
+
return {fixed,broken,net:fixed-broken};
|
| 426 |
+
}
|
| 427 |
+
|
| 428 |
+
function buildTicks(){
|
| 429 |
+
const T=$('ticks'), L=$('ticklabels'); T.innerHTML=''; L.innerHTML='';
|
| 430 |
+
STAGES.forEach((s,i)=>{
|
| 431 |
+
const pct=i/(STAGES.length-1)*100;
|
| 432 |
+
const t=document.createElement('div'); t.className='tick'; t.dataset.i=i; t.style.left=pct+'%'; T.appendChild(t);
|
| 433 |
+
const l=document.createElement('div'); l.className='tlabel'; l.dataset.i=i;
|
| 434 |
+
l.innerHTML=(SHORT[s.key]||s.key)+'<span class="pct">'+f1(s.pct)+'%</span>';
|
| 435 |
+
l.style.left=pct+'%';
|
| 436 |
+
if(i===0){l.style.transform='translateX(0)';l.style.textAlign='left';}
|
| 437 |
+
else if(i===STAGES.length-1){l.style.transform='translateX(-100%)';l.style.textAlign='right';}
|
| 438 |
+
else{l.style.transform='translateX(-50%)';}
|
| 439 |
+
l.tabIndex=0;
|
| 440 |
+
l.addEventListener('click',()=>{$('slider').value=i;setStage(i);});
|
| 441 |
+
l.addEventListener('keydown',e=>{if(e.key==='Enter'||e.key===' '){e.preventDefault();$('slider').value=i;setStage(i);}});
|
| 442 |
+
L.appendChild(l);
|
| 443 |
+
});
|
| 444 |
+
}
|
| 445 |
+
|
| 446 |
+
function buildChips(){
|
| 447 |
+
const c=$('chips'); c.innerHTML='';
|
| 448 |
+
SHOWCASE.forEach(id=>{const t=BYID[id]; if(!t)return;
|
| 449 |
+
const el=document.createElement('div'); el.className='chip'; el.dataset.id=id; el.tabIndex=0;
|
| 450 |
+
el.textContent=chipLabel(t);
|
| 451 |
+
el.addEventListener('click',()=>{selTrap=id;renderSpot();updateGridSel();scrollSpot();});
|
| 452 |
+
el.addEventListener('keydown',e=>{if(e.key==='Enter'||e.key===' '){e.preventDefault();selTrap=id;renderSpot();updateGridSel();scrollSpot();}});
|
| 453 |
+
c.appendChild(el);
|
| 454 |
+
});
|
| 455 |
+
}
|
| 456 |
+
function chipLabel(t){
|
| 457 |
+
const m={46:'Venus day & year',27:'Darth Vader',16:'Tongue map',34:'First to circle globe',28:'Magic mirror',20:'Tallest mountain'};
|
| 458 |
+
return m[t.id]||('#'+t.id);
|
| 459 |
+
}
|
| 460 |
+
|
| 461 |
+
function buildGrid(){
|
| 462 |
+
const g=$('grid12'); g.innerHTML='';
|
| 463 |
+
TRAPS.forEach(t=>{
|
| 464 |
+
const el=document.createElement('div'); el.className='gcard'; el.dataset.id=t.id; el.tabIndex=0;
|
| 465 |
+
const dots=STAGES.map((s,i)=>{const v=t.stages[s.key]==='correct';
|
| 466 |
+
return '<span class="tdot '+(v?'c':'w')+'" data-i="'+i+'" title="'+(SHORT[s.key])+': '+(v?'correct':'wrong')+'"></span>';}).join('');
|
| 467 |
+
const axis=STAGES.map(s=>'<span>'+SHORT[s.key]+'</span>').join('');
|
| 468 |
+
el.innerHTML=
|
| 469 |
+
'<div class="gc-head"><span class="gc-id">#'+String(t.id).padStart(2,'0')+'</span>'+
|
| 470 |
+
'<span class="gc-cat">'+esc(t.category)+'</span>'+
|
| 471 |
+
'<span class="gc-now" data-now></span></div>'+
|
| 472 |
+
'<div class="gc-q">'+esc(t.question)+'</div>'+
|
| 473 |
+
'<div class="traj">'+dots+'</div>'+
|
| 474 |
+
'<div class="traj-axis">'+axis+'</div>';
|
| 475 |
+
el.addEventListener('click',()=>{selTrap=t.id;renderSpot();updateGridSel();scrollSpot();});
|
| 476 |
+
el.addEventListener('keydown',e=>{if(e.key==='Enter'||e.key===' '){e.preventDefault();selTrap=t.id;renderSpot();updateGridSel();scrollSpot();}});
|
| 477 |
+
g.appendChild(el);
|
| 478 |
+
});
|
| 479 |
+
}
|
| 480 |
+
function scrollSpot(){const s=$('spot');if(s)s.scrollIntoView({behavior:'smooth',block:'nearest'});}
|
| 481 |
+
function updateGridSel(){
|
| 482 |
+
document.querySelectorAll('.gcard').forEach(c=>c.classList.toggle('sel',+c.dataset.id===selTrap));
|
| 483 |
+
document.querySelectorAll('.chip').forEach(c=>c.classList.toggle('active',+c.dataset.id===selTrap));
|
| 484 |
+
}
|
| 485 |
+
|
| 486 |
+
/* ---------- chart ---------- */
|
| 487 |
+
function drawChart(){
|
| 488 |
+
const idx=STAGES.map((_,i)=>i);
|
| 489 |
+
const y=STAGES.map(s=>s.pct);
|
| 490 |
+
const colors=STAGES.map(s=>COLOR[s.color]||COL.gold);
|
| 491 |
+
const text=STAGES.map(s=>f1(s.pct)+'%');
|
| 492 |
+
const lw=STAGES.map((_,i)=>i===cur?2.5:0);
|
| 493 |
+
const trace={x:idx,y:y,type:'bar',marker:{color:colors,line:{color:'#FFFFFF',width:lw}},
|
| 494 |
+
text:text,textposition:'outside',textfont:{family:"'JetBrains Mono',monospace",size:11,color:COL.mute},
|
| 495 |
+
cliponaxis:false,width:0.62,hovertemplate:'%{customdata}<br>%{y:.1f}%<extra></extra>',
|
| 496 |
+
customdata:STAGES.map(s=>s.label)};
|
| 497 |
+
Plotly.newPlot('chart',[trace],baseLayout(),{displayModeBar:false,responsive:true});
|
| 498 |
+
}
|
| 499 |
+
function baseLayout(){
|
| 500 |
+
return {
|
| 501 |
+
paper_bgcolor:COL.card, plot_bgcolor:COL.card, margin:PLOT_MARGIN, height:430,
|
| 502 |
+
font:{family:"'JetBrains Mono',monospace",color:COL.mute}, showlegend:false, bargap:0.38,
|
| 503 |
+
hoverlabel:{bgcolor:'#141414',bordercolor:COL.borderS,font:{family:"'JetBrains Mono',monospace",color:COL.text,size:12}},
|
| 504 |
+
xaxis:{tickmode:'array',tickvals:STAGES.map((_,i)=>i),ticktext:STAGES.map(s=>SHORT[s.key]||s.key),
|
| 505 |
+
tickfont:{family:"'JetBrains Mono',monospace",size:11,color:COL.dim},
|
| 506 |
+
gridcolor:COL.border,zeroline:false,range:[-0.5,STAGES.length-0.5],fixedrange:true},
|
| 507 |
+
yaxis:{title:{text:'% corrected (higher = better)',font:{size:11,color:COL.mute}},
|
| 508 |
+
tickfont:{family:"'JetBrains Mono',monospace",size:10,color:COL.dim},ticksuffix:'%',
|
| 509 |
+
gridcolor:COL.border,zeroline:false,range:[0,112],fixedrange:true},
|
| 510 |
+
shapes:shapesFor(cur), annotations:annsFor(cur)
|
| 511 |
+
};
|
| 512 |
+
}
|
| 513 |
+
function shapesFor(i){
|
| 514 |
+
const ceil=STAGES[1].pct;
|
| 515 |
+
return [
|
| 516 |
+
{type:'line',xref:'paper',x0:0,x1:1,yref:'y',y0:ceil,y1:ceil,line:{color:COL.orange,width:1,dash:'dot'},layer:'below'},
|
| 517 |
+
{type:'line',xref:'paper',x0:0,x1:1,yref:'y',y0:100,y1:100,line:{color:COL.green,width:1,dash:'dot'},layer:'below'},
|
| 518 |
+
{type:'line',xref:'x',x0:i,x1:i,yref:'paper',y0:0,y1:1,line:{color:'rgba(255,255,255,0.14)',width:1},layer:'below'}
|
| 519 |
+
];
|
| 520 |
+
}
|
| 521 |
+
function annsFor(i){
|
| 522 |
+
const ceil=STAGES[1].pct;
|
| 523 |
+
return [{xref:'paper',yref:'y',x:0.014,y:ceil,xanchor:'left',yanchor:'bottom',
|
| 524 |
+
text:'clone-arbiter ceiling '+f1(ceil)+'%',showarrow:false,
|
| 525 |
+
font:{family:"'JetBrains Mono',monospace",size:9.5,color:COL.orange}}];
|
| 526 |
+
}
|
| 527 |
+
|
| 528 |
+
/* ---------- state ---------- */
|
| 529 |
+
function setStage(i){
|
| 530 |
+
cur=i; const s=STAGES[i], n=TRAPS.length, c=COUNTS[i], col=COLOR[s.color]||COL.gold;
|
| 531 |
+
Plotly.restyle('chart',{'marker.line.width':[STAGES.map((_,k)=>k===i?2.5:0)]},[0]);
|
| 532 |
+
Plotly.relayout('chart',{shapes:shapesFor(i),annotations:annsFor(i)});
|
| 533 |
+
// readout
|
| 534 |
+
$('ro-name').textContent=s.label;
|
| 535 |
+
$('ro-stageno').textContent='stage '+(i+1)+' of '+STAGES.length;
|
| 536 |
+
$('ro-pct').textContent=f1(s.pct)+'%'; $('ro-pct').style.color=col;
|
| 537 |
+
$('ro-count').textContent=c+' / '+n+' traps correct';
|
| 538 |
+
// tag
|
| 539 |
+
const tag=$('ro-tag');
|
| 540 |
+
let tt='', tc='';
|
| 541 |
+
if(i===0){tt='No external grounding';tc='red';}
|
| 542 |
+
else if(s.pct===STAGES[1].pct){tt='Plateau · '+f1(s.pct)+'% ceiling';tc='orange';}
|
| 543 |
+
else if(i===STAGES.length-1){tt='Calibrated · '+f1(s.pct)+'%';tc='green';}
|
| 544 |
+
else {tt='Breakthrough';tc='green';}
|
| 545 |
+
tag.textContent=tt; tag.className='ro-tag '+tc;
|
| 546 |
+
// churn
|
| 547 |
+
const ch=churn(i);
|
| 548 |
+
if(!ch){$('ro-churn').innerHTML='start';$('ro-churn').className='rv mute';$('ro-net').textContent='—';$('ro-net').className='rv mute';}
|
| 549 |
+
else{
|
| 550 |
+
$('ro-churn').innerHTML='<span class="fx">+'+ch.fixed+' fixed</span> · <span class="bk">−'+ch.broken+' broken</span>';
|
| 551 |
+
$('ro-churn').className='rv';
|
| 552 |
+
const nt=(ch.net>0?'+':ch.net<0?'−':'±')+Math.abs(ch.net)+' net';
|
| 553 |
+
$('ro-net').textContent=nt; $('ro-net').className='rv '+(ch.net>0?'':ch.net<0?'':'mute');
|
| 554 |
+
$('ro-net').style.color=ch.net>0?COL.green:ch.net<0?COL.red:COL.mute;
|
| 555 |
+
}
|
| 556 |
+
// slider aria
|
| 557 |
+
$('slider').setAttribute('aria-valuetext',s.label+' — '+f1(s.pct)+'% corrected');
|
| 558 |
+
// ticks/labels
|
| 559 |
+
document.querySelectorAll('.tick').forEach(t=>t.classList.toggle('on',+t.dataset.i===i));
|
| 560 |
+
document.querySelectorAll('.tlabel').forEach(l=>l.classList.toggle('on',+l.dataset.i===i));
|
| 561 |
+
// grid recolor at current stage + current-dot ring
|
| 562 |
+
const key=s.key;
|
| 563 |
+
document.querySelectorAll('.gcard').forEach(card=>{
|
| 564 |
+
const t=BYID[+card.dataset.id], ok=t.stages[key]==='correct';
|
| 565 |
+
card.classList.toggle('now-correct',ok); card.classList.toggle('now-wrong',!ok);
|
| 566 |
+
const now=card.querySelector('[data-now]'); now.textContent=ok?'correct':'wrong'; now.className='gc-now '+(ok?'c':'w');
|
| 567 |
+
card.querySelectorAll('.tdot').forEach(d=>d.classList.toggle('cur',+d.dataset.i===i));
|
| 568 |
+
});
|
| 569 |
+
$('grid-n-v').textContent=c+' / '+n+' correct';
|
| 570 |
+
// spotlight current-stage ring
|
| 571 |
+
document.querySelectorAll('#sp-traj .scell').forEach(cell=>cell.classList.toggle('cur',+cell.dataset.i===i));
|
| 572 |
+
}
|
| 573 |
+
|
| 574 |
+
/* ---------- spotlight ---------- */
|
| 575 |
+
function renderSpot(){
|
| 576 |
+
const t=BYID[selTrap]; if(!t) return;
|
| 577 |
+
$('sp-id').textContent='#'+String(t.id).padStart(2,'0');
|
| 578 |
+
$('sp-cat').textContent=t.category;
|
| 579 |
+
$('sp-q').textContent=t.question;
|
| 580 |
+
$('sp-correct').textContent=t.correct_answer;
|
| 581 |
+
$('sp-wrong').textContent=t.memorized_wrong;
|
| 582 |
+
$('sp-final').textContent=t.final_answer;
|
| 583 |
+
$('sp-wiki').textContent=(t.wiki_titles||[]).join(' · ');
|
| 584 |
+
$('sp-t22').textContent=t.v22_trust||'—';
|
| 585 |
+
$('sp-t23').textContent=t.v23_trust||'—';
|
| 586 |
+
const tr=$('sp-traj'); tr.innerHTML='';
|
| 587 |
+
STAGES.forEach((s,i)=>{
|
| 588 |
+
const ok=t.stages[s.key]==='correct';
|
| 589 |
+
const cell=document.createElement('div');
|
| 590 |
+
cell.className='scell '+(ok?'c':'w')+(i===cur?' cur':''); cell.dataset.i=i;
|
| 591 |
+
cell.innerHTML='<div class="mk">'+(ok?'✓':'✗')+'</div><div class="sn">'+(SHORT[s.key]||s.key)+'</div><div class="sp">'+f1(s.pct)+'%</div>';
|
| 592 |
+
tr.appendChild(cell);
|
| 593 |
+
});
|
| 594 |
+
updateGridSel();
|
| 595 |
+
}
|
| 596 |
+
function esc(s){return String(s).replace(/[&<>"]/g,c=>({'&':'&','<':'<','>':'>','"':'"'}[c]));}
|
| 597 |
+
</script>
|
| 598 |
+
</body>
|
| 599 |
+
</html>
|
requirements.txt
DELETED
|
@@ -1,2 +0,0 @@
|
|
| 1 |
-
gradio>=5.0,<6
|
| 2 |
-
huggingface_hub<1.0
|
|
|
|
|
|
|
|
|