Spaces:

Chris4K
/

Compression_Navigator

Paused

App Files Files Community

Chris4K commited on 14 days ago

Commit

cbcdc9d

verified ·

1 Parent(s): 836f3d5

Update app.py

Browse files

Files changed (1) hide show

app.py +271 -53

app.py CHANGED Viewed

@@ -768,7 +768,9 @@ def _verdict(before, after, subject, new_answer, drift_thresh=0.05):
     return eff, collateral, max_drift, ent_blowup, surgical
-def edit_and_verify(subject, new_answer, method, strength, use_llm, llm_model, api_key):
     model, tok = get_handles("glassbox")
     STATE["name"] = "glassbox"
     model.reset()
@@ -800,10 +802,20 @@ def edit_and_verify(subject, new_answer, method, strength, use_llm, llm_model, a
           "", "VERDICT: %s" % ("SURGICAL EDIT" if surgical else "COLLATERAL DAMAGE")]
     L.append("(model is left in the edited state - inspect it in tabs 1-5, or hit Reset.)")
     if use_llm:
-        L += ["", "-" * 60, "INDEPENDENT LLM REVIEW:", _llm_judge(
-            before, after, subject, new_answer, llm_model, api_key)]
-    return "\n".join(L)
 def reset_glassbox():
@@ -812,13 +824,13 @@ def reset_glassbox():
     return "Glass-box weights restored to pristine. Re-run any tab to confirm."
-# --- optional: real LLM calls to verify the edit (independent of our metrics) -
-def _llm_judge(before, after, subject, new_answer, llm_model, api_key):
-    import os, json
-    key = (api_key or "").strip() or os.environ.get("ANTHROPIC_API_KEY", "")
-    if not key:
-        return ("(skipped - no API key. Paste an Anthropic key or set "
-                "ANTHROPIC_API_KEY to have Claude independently judge the edit.)")
     payload = {c: {"prompt": before[c]["prompt"],
                    "before_top1": before[c]["top1"], "before_p_orig": round(before[c]["p_orig"], 3),
                    "after_top1": after[c]["top1"],  "after_p_orig": round(after[c]["p_orig"], 3)}
@@ -826,39 +838,162 @@ def _llm_judge(before, after, subject, new_answer, llm_model, api_key):
     sys = ("You audit knowledge edits to a small language model. The intended edit "
            "is: make %s's capital '%s'. Given before/after predictions for every "
            "known fact, decide if the edit was SURGICAL (target changed, all other "
-           "facts unchanged) or caused COLLATERAL damage. Reply ONLY as JSON: "
-           '{"verdict":"surgical|collateral","target_changed":bool,'
-           '"damaged_facts":[...],"confidence":0-1,"reason":"one sentence"}.'
-           ) % (subject, new_answer)
-    body = {"model": (llm_model or "claude-sonnet-4-6").strip(), "max_tokens": 400,
-            "system": sys,
-            "messages": [{"role": "user", "content": json.dumps(payload)}]}
     try:
-        try:                                    # prefer the official SDK if present
             import anthropic
             client = anthropic.Anthropic(api_key=key)
             msg = client.messages.create(**body)
             text = "".join(b.text for b in msg.content if getattr(b, "type", "") == "text")
-        except ImportError:                     # fall back to a raw HTTPS call
             import urllib.request
             req = urllib.request.Request(
-                "https://api.anthropic.com/v1/messages",
-                data=json.dumps(body).encode(),
                 headers={"x-api-key": key, "anthropic-version": "2023-06-01",
                          "content-type": "application/json"})
             with urllib.request.urlopen(req, timeout=30) as r:
                 data = json.loads(r.read())
             text = "".join(b.get("text", "") for b in data.get("content", [])
                            if b.get("type") == "text")
-        clean = text.strip().strip("`")
-        if clean.startswith("json"):
-            clean = clean[4:].strip()
-        v = json.loads(clean)
-        return ("verdict=%s  target_changed=%s  confidence=%s\n  damaged: %s\n  reason: %s"
-                % (v.get("verdict"), v.get("target_changed"), v.get("confidence"),
-                   v.get("damaged_facts") or "none", v.get("reason")))
     except Exception as e:
-        return "(LLM review failed: %s)" % e
 # =============================================================================
@@ -994,6 +1129,46 @@ def upload_to_hf(repo_id, token, what, app_path=__file__):
         return "Upload failed: %s" % e
 # =============================================================================
 # UI
 # =============================================================================
@@ -1159,54 +1334,72 @@ the fact is read here. The peak line names the site.
         gr.Markdown("""
 ### Edit a fact, then prove nothing else broke
 **What it does:** rewrites the value one fact-MLP key maps to (the exact thing
-ROME/MEMIT do on real models), then runs a verification battery over **every**
 known fact to measure **efficacy** (target changed), **specificity** (others
 untouched), and **fluency** (no entropy collapse).
 **Two methods, on purpose:**
 - `rank1` — the minimal, surgical update. Only the target fact moves → **SURGICAL**.
-- `broadcast` — a deliberately sloppy edit that smears the change across all facts → the harness catches the **COLLATERAL DAMAGE**. This proves the verifier actually works.
-**Optional independent review:** tick the box and paste an Anthropic key (or set
-`ANTHROPIC_API_KEY`) to have **Claude** judge the before/after battery and return
-its own surgical/collateral verdict — a second, model-based check on top of the
-deterministic metrics.
 Subjects: `france`, `germany`, `japan`. Answers: `paris, berlin, tokyo, london, rome`.
 After editing, the model stays edited — go look at it in tabs 1–5 (the logit lens
 will show the new answer rising; the trace still localises to L0). Hit **Reset**
-to restore.
 """)
         with gr.Row():
             ed_subj = gr.Textbox(value="france", label="subject")
             ed_new = gr.Textbox(value="london", label="new answer")
             ed_method = gr.Radio(["rank1", "broadcast"], value="rank1", label="method")
             ed_strength = gr.Slider(0.2, 2.0, value=1.0, step=0.1, label="strength")
-        with gr.Row():
-            ed_llm = gr.Checkbox(value=False, label="also ask Claude to verify")
-            ed_model = gr.Textbox(value="claude-sonnet-4-6", label="Claude model")
-            ed_key = gr.Textbox(value="", label="Anthropic API key (optional)", type="password")
-        ed_out = gr.Textbox(label="edit + verification report", lines=22)
         with gr.Row():
             gr.Button("Edit & verify", variant="primary").click(
                 edit_and_verify,
-                [ed_subj, ed_new, ed_method, ed_strength, ed_llm, ed_model, ed_key], ed_out)
             gr.Button("Reset model").click(reset_glassbox, outputs=ed_out)
     # ---- TAB 7 -------------------------------------------------------------
     with gr.Tab("7 · Export / Upload to HF"):
         gr.Markdown("""
-### Ship it to the Hub
 **Export** writes a self-contained, reloadable repo: weights (`safetensors`),
 `config.json`, `vocab.json`, a standalone `modeling_glassbox.py` (reload with
 `from modeling_glassbox import load`), and a model card.
-**Upload** pushes it to the Hub. Choose:
-- `model` — the glass-box as a model repo.
-- `space` — *this whole app* as a runnable Gradio Space (adds `requirements.txt`).
-- `both`.
-Paste a **write** token (or set `HF_TOKEN`). Repo id like `Chris4K/glassbox-interp`.
 """)
         with gr.Row():
             hf_repo = gr.Textbox(value="Chris4K/glassbox-interp", label="repo id")
@@ -1219,13 +1412,38 @@ Paste a **write** token (or set `HF_TOKEN`). Repo id like `Chris4K/glassbox-inte
             gr.Button("Upload to HF", variant="primary").click(
                 upload_to_hf, [hf_repo, hf_token, hf_what], hf_out)
     gr.Markdown("""
 ---
 ### Where this goes next
-- **Real-model MEMIT:** the edit loop here is exact because the glass-box's fact layer is literally key→value. The same verify harness (efficacy / specificity / fluency + the Claude judge) ports straight onto a gpt2/Llama MEMIT edit — the toy is the regression test you run first.
-- **Multi-hop & paraphrase generalization:** add `"the currency of france is"` so two relations share a subject, and have the Claude judge auto-generate paraphrase probes to test that an edit generalizes, not just memorizes the one prompt.
 - **Attribution view:** Geva-style "what does this neuron write to the vocab", per-head attention attribution.
-- **It already ships:** tab 7 pushes the model and this whole app (as a Space) to your Hub.
 """)
     demo.load(lambda: load_model("glassbox"), outputs=load_status)

     return eff, collateral, max_drift, ent_blowup, surgical
+def edit_and_verify(subject, new_answer, method, strength, use_llm,
+                    anthropic_key, anthropic_model, hf_token, hf_model,
+                    local_url, local_model):
     model, tok = get_handles("glassbox")
     STATE["name"] = "glassbox"
     model.reset()
           "", "VERDICT: %s" % ("SURGICAL EDIT" if surgical else "COLLATERAL DAMAGE")]
     L.append("(model is left in the edited state - inspect it in tabs 1-5, or hit Reset.)")
+    llm_report = ""
     if use_llm:
+        providers = [
+            {"type": "anthropic", "key": anthropic_key, "model": anthropic_model},
+            {"type": "hf",        "key": hf_token,       "model": hf_model},
+            {"type": "local",     "url": local_url,      "model": local_model},
+        ]
+        llm_report = _llm_judge_chain(before, after, subject, new_answer, providers)
+        L += ["", "-" * 60, "INDEPENDENT LLM REVIEW:", llm_report]
+    report = "\n".join(L)
+    _log_session(subject, new_answer, method, strength, before, after,
+                eff, collateral, max_drift, surgical, llm_report)
+    return report
 def reset_glassbox():
     return "Glass-box weights restored to pristine. Re-run any tab to confirm."
+# --- optional: real LLM calls to verify the edit, with a 3-tier fallback chain
+# Anthropic (Claude) -> Hugging Face Inference -> local OpenAI-compatible server
+# (e.g. LM Studio). Tries each in order; the first provider that's configured
+# AND reachable wins. This means you're never blocked on one vendor being down
+# or on not having an Anthropic key at all - your own RTX 5090 can be the judge.
+def _build_judge_prompt(before, after, subject, new_answer):
+    import json
     payload = {c: {"prompt": before[c]["prompt"],
                    "before_top1": before[c]["top1"], "before_p_orig": round(before[c]["p_orig"], 3),
                    "after_top1": after[c]["top1"],  "after_p_orig": round(after[c]["p_orig"], 3)}
     sys = ("You audit knowledge edits to a small language model. The intended edit "
            "is: make %s's capital '%s'. Given before/after predictions for every "
            "known fact, decide if the edit was SURGICAL (target changed, all other "
+           "facts unchanged) or caused COLLATERAL damage. Reply ONLY as JSON, no "
+           'prose, no markdown fences: {"verdict":"surgical|collateral",'
+           '"target_changed":bool,"damaged_facts":[...],"confidence":0-1,'
+           '"reason":"one sentence"}.') % (subject, new_answer)
+    return sys, json.dumps(payload)
+def _parse_verdict_json(text, provider_label):
+    import json
+    clean = text.strip().strip("`")
+    if clean.lower().startswith("json"):
+        clean = clean[4:].strip()
+    start, end = clean.find("{"), clean.rfind("}")
+    if start != -1 and end != -1:
+        clean = clean[start:end + 1]
+    v = json.loads(clean)
+    return ("[%s] verdict=%s  target_changed=%s  confidence=%s\n  damaged: %s\n  reason: %s"
+            % (provider_label, v.get("verdict"), v.get("target_changed"), v.get("confidence"),
+               v.get("damaged_facts") or "none", v.get("reason")))
+def _try_anthropic(sys, user, cfg):
+    import os, json
+    key = (cfg.get("key") or "").strip() or os.environ.get("ANTHROPIC_API_KEY", "")
+    if not key:
+        return None, "anthropic: no key configured"
+    body = {"model": (cfg.get("model") or "claude-sonnet-4-6").strip(),
+            "max_tokens": 400, "system": sys, "messages": [{"role": "user", "content": user}]}
     try:
+        try:
             import anthropic
             client = anthropic.Anthropic(api_key=key)
             msg = client.messages.create(**body)
             text = "".join(b.text for b in msg.content if getattr(b, "type", "") == "text")
+        except ImportError:
             import urllib.request
             req = urllib.request.Request(
+                "https://api.anthropic.com/v1/messages", data=json.dumps(body).encode(),
                 headers={"x-api-key": key, "anthropic-version": "2023-06-01",
                          "content-type": "application/json"})
             with urllib.request.urlopen(req, timeout=30) as r:
                 data = json.loads(r.read())
             text = "".join(b.get("text", "") for b in data.get("content", [])
                            if b.get("type") == "text")
+        return _parse_verdict_json(text, "anthropic:" + body["model"]), None
     except Exception as e:
+        return None, "anthropic failed: %s" % e
+def _try_hf(sys, user, cfg):
+    token = (cfg.get("key") or "").strip()
+    model = (cfg.get("model") or "Qwen/Qwen2.5-7B-Instruct").strip()
+    if not token:
+        import os
+        token = os.environ.get("HF_TOKEN", "")
+    if not token:
+        return None, "hf: no token configured"
+    try:
+        from huggingface_hub import InferenceClient
+        client = InferenceClient(model=model, token=token)
+        resp = client.chat_completion(
+            messages=[{"role": "system", "content": sys}, {"role": "user", "content": user}],
+            max_tokens=400)
+        text = resp.choices[0].message.content
+        return _parse_verdict_json(text, "hf:" + model), None
+    except Exception as e:
+        return None, "hf failed: %s" % e
+def _try_local(sys, user, cfg):
+    """Any OpenAI-compatible /v1/chat/completions server - LM Studio, vLLM,
+    Ollama (with its OpenAI shim), text-generation-webui, etc."""
+    import json, urllib.request
+    url = (cfg.get("url") or "").strip().rstrip("/")
+    if not url:
+        return None, "local: no URL configured"
+    model = (cfg.get("model") or "local-model").strip()
+    body = json.dumps({"model": model, "max_tokens": 400, "temperature": 0,
+                       "messages": [{"role": "system", "content": sys},
+                                   {"role": "user", "content": user}]}).encode()
+    try:
+        req = urllib.request.Request(
+            url + "/v1/chat/completions", data=body,
+            headers={"content-type": "application/json"})
+        with urllib.request.urlopen(req, timeout=20) as r:
+            data = json.loads(r.read())
+        text = data["choices"][0]["message"]["content"]
+        return _parse_verdict_json(text, "local:" + model + "@" + url), None
+    except Exception as e:
+        return None, "local failed: %s" % e
+def _llm_judge_chain(before, after, subject, new_answer, providers):
+    sys, user = _build_judge_prompt(before, after, subject, new_answer)
+    dispatch = {"anthropic": _try_anthropic, "hf": _try_hf, "local": _try_local}
+    skipped = []
+    for cfg in providers:
+        fn = dispatch.get(cfg["type"])
+        if fn is None:
+            continue
+        result, err = fn(sys, user, cfg)
+        if result is not None:
+            note = ("" if not skipped else
+                    "(skipped: %s)\n" % "; ".join(skipped))
+            return note + result
+        skipped.append(err)
+    return ("all providers unavailable:\n  " + "\n  ".join(skipped) +
+            "\n(configure at least one: Anthropic key, HF token, or a local "
+            "OpenAI-compatible server URL like http://192.168.188.25:1234)")
+# --- session log: every edit+verify run is appended here as JSON, so you can
+# download it, or paste the markdown block straight into a future chat with
+# Claude for review ("did all work, here's the log").
+SESSION_LOG = []
+def _log_session(subject, new_answer, method, strength, before, after,
+                 eff, collateral, max_drift, surgical, llm_report):
+    import datetime
+    SESSION_LOG.append({
+        "ts": datetime.datetime.utcnow().isoformat() + "Z",
+        "subject": subject, "new_answer": new_answer, "method": method,
+        "strength": strength, "efficacy_pass": bool(eff),
+        "collateral": collateral, "max_drift": round(max_drift, 4),
+        "verdict": "SURGICAL" if surgical else "COLLATERAL",
+        "before": {c: {"top1": before[c]["top1"], "p_orig": round(before[c]["p_orig"], 4)}
+                  for c in before},
+        "after": {c: {"top1": after[c]["top1"], "p_orig": round(after[c]["p_orig"], 4)}
+                 for c in after},
+        "llm_review": llm_report or None,
+    })
+def export_session_log():
+    import json, os
+    if not SESSION_LOG:
+        return None, "No edits run yet this session - nothing to export."
+    os.makedirs("/mnt/user-data/outputs", exist_ok=True)
+    path = "/mnt/user-data/outputs/edit_session_log.json"
+    json.dump(SESSION_LOG, open(path, "w"), indent=2)
+    # also a markdown rendition meant to be pasted straight into a chat
+    md = ["# Edit session log\n"]
+    for i, e in enumerate(SESSION_LOG, 1):
+        md.append("## Edit %d - %s (%s, %s, strength=%s)\n" %
+                  (i, e["verdict"], e["subject"] + "->" + e["new_answer"],
+                   e["method"], e["strength"]))
+        md.append("- efficacy: %s, max collateral drift: %.4f, damaged: %s" %
+                  ("pass" if e["efficacy_pass"] else "fail", e["max_drift"],
+                   e["collateral"] or "none"))
+        if e["llm_review"]:
+            md.append("- LLM review: " + e["llm_review"].replace("\n", " "))
+        md.append("")
+    md_path = "/mnt/user-data/outputs/edit_session_log.md"
+    open(md_path, "w").write("\n".join(md))
+    return path, "Wrote %d edit(s) to %s and %s" % (len(SESSION_LOG), path, md_path)
 # =============================================================================
         return "Upload failed: %s" % e
+# --- upload a REAL model (e.g. a VINDEX-edited Llama checkpoint), not the toy.
+# This does NOT load the model into memory (multi-GB Llama weights don't need
+# to round-trip through Python) - it just pushes whatever's already on disk.
+# Point it at the local folder produced by your save_pretrained()/VINDEX run:
+# expects the usual HF layout (config.json + .safetensors shards + tokenizer
+# files). Note: gated models (e.g. meta-llama/*) require the destination repo
+# to either be your own namespace or one you have write access to - the Hub's
+# license gate is independent of this upload step.
+def upload_local_checkpoint(local_dir, repo_id, token, private, commit_message):
+    import os
+    try:
+        from huggingface_hub import HfApi
+    except ImportError:
+        return "huggingface_hub not installed. `pip install huggingface_hub`."
+    local_dir = (local_dir or "").strip()
+    repo_id = (repo_id or "").strip()
+    if not local_dir or not os.path.isdir(local_dir):
+        return "local_dir %r does not exist or is not a directory." % local_dir
+    if not repo_id:
+        return "Enter a repo id like 'Chris4K/vindex-llama3-edited'."
+    token = (token or "").strip() or os.environ.get("HF_TOKEN", "")
+    if not token:
+        return "No HF token. Paste a write token or set HF_TOKEN."
+    has_cfg = os.path.exists(os.path.join(local_dir, "config.json"))
+    has_weights = any(f.endswith((".safetensors", ".bin"))
+                      for f in os.listdir(local_dir))
+    warn = "" if (has_cfg and has_weights) else (
+        "WARNING: folder is missing config.json or weight files - this may "
+        "not be a loadable HF checkpoint. Uploading anyway.\n")
+    api = HfApi(token=token)
+    try:
+        api.create_repo(repo_id, repo_type="model", private=bool(private), exist_ok=True)
+        api.upload_folder(folder_path=local_dir, repo_id=repo_id, repo_type="model",
+                          commit_message=(commit_message or "upload checkpoint").strip())
+        return (warn + "Uploaded %s -> https://huggingface.co/%s\n"
+                "Files: %s" % (local_dir, repo_id, ", ".join(sorted(os.listdir(local_dir))[:12])))
+    except Exception as e:
+        return warn + "Upload failed: %s" % e
 # =============================================================================
 # UI
 # =============================================================================
         gr.Markdown("""
 ### Edit a fact, then prove nothing else broke
 **What it does:** rewrites the value one fact-MLP key maps to (the exact thing
+ROME/MEMIT do on real models — this is a literal `nn.Module` weight tensor,
+not a token or vocab change), then runs a verification battery over **every**
 known fact to measure **efficacy** (target changed), **specificity** (others
 untouched), and **fluency** (no entropy collapse).
 **Two methods, on purpose:**
 - `rank1` — the minimal, surgical update. Only the target fact moves → **SURGICAL**.
+- `broadcast` — a deliberately sloppy edit that smears the change across all facts → the harness catches the **COLLATERAL DAMAGE**. This proves the verifier actually works, not just reports "ok" by default.
+**Independent LLM review, with a fallback chain — not locked to one vendor:**
+tick the box and it tries, in order: **Anthropic** (Claude, if you give a key)
+→ **Hugging Face Inference** (any hosted chat model, if you give an HF token)
+→ **your own local server** (LM Studio / vLLM / Ollama's OpenAI shim — anything
+exposing `/v1/chat/completions`). The first one that's configured *and*
+reachable answers; the rest are skipped and noted. So your own RTX 5090 can
+be the judge with zero cloud calls if you just fill in the local URL.
 Subjects: `france`, `germany`, `japan`. Answers: `paris, berlin, tokyo, london, rome`.
 After editing, the model stays edited — go look at it in tabs 1–5 (the logit lens
 will show the new answer rising; the trace still localises to L0). Hit **Reset**
+to restore. Every run is appended to a session log you can download below and
+paste into a future chat for review.
 """)
         with gr.Row():
             ed_subj = gr.Textbox(value="france", label="subject")
             ed_new = gr.Textbox(value="london", label="new answer")
             ed_method = gr.Radio(["rank1", "broadcast"], value="rank1", label="method")
             ed_strength = gr.Slider(0.2, 2.0, value=1.0, step=0.1, label="strength")
+        ed_llm = gr.Checkbox(value=False, label="also run an independent LLM review")
+        with gr.Accordion("LLM review providers (tried in this order)", open=False):
+            with gr.Row():
+                ed_a_model = gr.Textbox(value="claude-sonnet-4-6", label="1. Anthropic model")
+                ed_a_key = gr.Textbox(value="", label="Anthropic API key", type="password")
+            with gr.Row():
+                ed_h_model = gr.Textbox(value="Qwen/Qwen2.5-7B-Instruct",
+                                        label="2. HF Inference model")
+                ed_h_key = gr.Textbox(value="", label="HF token", type="password")
+            with gr.Row():
+                ed_l_url = gr.Textbox(value="http://192.168.188.25:1234",
+                                      label="3. Local server URL (LM Studio etc.)")
+                ed_l_model = gr.Textbox(value="local-model", label="local model name")
+        ed_out = gr.Textbox(label="edit + verification report", lines=24)
         with gr.Row():
             gr.Button("Edit & verify", variant="primary").click(
                 edit_and_verify,
+                [ed_subj, ed_new, ed_method, ed_strength, ed_llm,
+                 ed_a_key, ed_a_model, ed_h_key, ed_h_model, ed_l_url, ed_l_model],
+                ed_out)
             gr.Button("Reset model").click(reset_glassbox, outputs=ed_out)
+        gr.Markdown("**Session log** (every edit run above, appended):")
+        with gr.Row():
+            log_btn = gr.Button("Write session log to disk")
+            log_file = gr.File(label="download")
+        log_status = gr.Markdown()
+        log_btn.click(lambda: export_session_log(), outputs=[log_file, log_status])
     # ---- TAB 7 -------------------------------------------------------------
     with gr.Tab("7 · Export / Upload to HF"):
         gr.Markdown("""
+### Ship the toy glass-box
 **Export** writes a self-contained, reloadable repo: weights (`safetensors`),
 `config.json`, `vocab.json`, a standalone `modeling_glassbox.py` (reload with
 `from modeling_glassbox import load`), and a model card.
+**Upload** pushes it to the Hub. Choose `model`, `space` (this whole app,
+runnable), or `both`. Paste a **write** token (or set `HF_TOKEN`).
 """)
         with gr.Row():
             hf_repo = gr.Textbox(value="Chris4K/glassbox-interp", label="repo id")
             gr.Button("Upload to HF", variant="primary").click(
                 upload_to_hf, [hf_repo, hf_token, hf_what], hf_out)
+        gr.Markdown("""
+---
+### Upload a REAL model — e.g. your VINDEX-edited Llama checkpoint
+This does **not** load the model into memory and does **not** assume any
+particular architecture — it just pushes whatever's already on disk at
+`local_dir` (the usual `save_pretrained()` layout: `config.json` +
+`*.safetensors` shards + tokenizer files) straight to a new repo. Large
+weights upload fine through `upload_folder`; for very large repos consider
+installing `hf_transfer` for faster throughput. If the base model is gated
+(e.g. `meta-llama/*`), the gate applies to the destination repo's license
+settings, not to this upload step.
+""")
+        with gr.Row():
+            rc_dir = gr.Textbox(value="", label="local checkpoint folder (on this machine)")
+            rc_repo = gr.Textbox(value="", label="destination repo id, e.g. Chris4K/vindex-llama3-edited")
+        with gr.Row():
+            rc_token = gr.Textbox(value="", label="HF write token (optional)", type="password")
+            rc_private = gr.Checkbox(value=True, label="private repo")
+            rc_msg = gr.Textbox(value="upload edited checkpoint", label="commit message")
+        rc_out = gr.Textbox(label="result", lines=6)
+        gr.Button("Upload real checkpoint", variant="primary").click(
+            upload_local_checkpoint, [rc_dir, rc_repo, rc_token, rc_private, rc_msg], rc_out)
     gr.Markdown("""
 ---
 ### Where this goes next
+- **Closing the loop (what "self-improving" would actually require):** right now a human picks every edit; the verifier just grades it. A real closed loop needs a policy that *proposes* edits on its own (e.g. scanning eval failures for wrong facts), auto-applies, and auto-commits only on a SURGICAL verdict, rolling back otherwise. The hard part — the verifier — already exists here; the proposal step doesn't yet.
+- **A training-method angle worth taking seriously:** instead of accept/reject after the fact, feed the specificity battery's drift score back as a regularizer *during* the edit computation (closer to elastic weight consolidation, or the null-space projection AlphaEdit-style methods use) so collateral is penalized while solving, not caught after.
+- **Real-model MEMIT:** the edit loop here is exact because the glass-box's fact layer is literally key→value. The same verify harness (efficacy / specificity / fluency + the multi-provider LLM judge) ports straight onto a gpt2/Llama MEMIT edit — the toy is the regression test you run first.
+- **Multi-hop & paraphrase generalization:** add `"the currency of france is"` so two relations share a subject, and have the LLM judge auto-generate paraphrase probes to test that an edit generalizes, not just memorizes the one prompt.
 - **Attribution view:** Geva-style "what does this neuron write to the vocab", per-head attention attribution.
+- **It already ships:** tab 7 pushes the toy model and this whole app (as a Space) to your Hub, or a real local checkpoint folder to its own repo.
 """)
     demo.load(lambda: load_model("glassbox"), outputs=load_status)