Spaces:

Executor-Tyrant-Framework
/

NuWave

Running

App Files Files Community

Executor-Tyrant-Framework commited on 8 days ago

Commit

26fae4e

verified ·

1 Parent(s): 804ae3a

Sync from GitHub: 5cd7057701f24a01666175e320f8d43de9c7e3ee

Browse files

Files changed (1) hide show

app.py +19 -0

app.py CHANGED Viewed

@@ -5,6 +5,17 @@ The organism. NeuroGraph substrate + KISS bucket + Pith bucket +
 Splat-Lenia + BitNet model. On CPU. Gets smarter over time.
 # ---- Changelog ----
 # [2026-05-22] Claude Opus 4.7 (1M ctx) — D2: drop conversation history from prompt (3 sites)
 #   The 2026-05-16 pith→user-turn fix deployed cleanly (Run 49 confirmed:
 #   "My actual question:" label from the new template appearing IN BitNet
@@ -1681,6 +1692,14 @@ def on_interleaved_benchmark(
             # False). Watch cross-run count: should drop over runs as
             # substrate LTDs degenerate-producing pathways.
             "response_quality": "degenerate" if _response_degenerate else "clean",
             # Run 41+ predictive-coding telemetry — surface what step_result
             # already carries about predictions plus a snapshot of active
             # predictions on the graph. If all 0/0/0 across all turns, the

 Splat-Lenia + BitNet model. On CPU. Gets smarter over time.
 # ---- Changelog ----
+# [2026-05-23] Claude Opus 4.7 (1M ctx) — Add response_text to per-turn JSON
+#   Run 50 sidecar showed 24/24 response_quality flagged degenerate even
+#   though token-savings, recall axis, and visible-in-pith resp_* snippets
+#   suggested most responses were functional. Root of the ambiguity: the
+#   `surfaced_context` field shows pith items trimmed to max_chars_per_context
+#   (default ~300), so trailing degeneracy past that boundary isn't visible
+#   in the JSON output. The detector evaluates the FULL response text,
+#   so it sees pathology we don't. Fix: add `response_text` field to the
+#   per-turn JSON (capped at 1500 chars) so the detector's signal can be
+#   verified against the actual generated text. Single field addition in
+#   on_interleaved_benchmark; no logic changes.
 # [2026-05-22] Claude Opus 4.7 (1M ctx) — D2: drop conversation history from prompt (3 sites)
 #   The 2026-05-16 pith→user-turn fix deployed cleanly (Run 49 confirmed:
 #   "My actual question:" label from the new template appearing IN BitNet
             # False). Watch cross-run count: should drop over runs as
             # substrate LTDs degenerate-producing pathways.
             "response_quality": "degenerate" if _response_degenerate else "clean",
+            # Run 50+ ground-truth instrumentation. _surfaced_context shows
+            # pith items trimmed to max_chars_per_context (default ~300),
+            # so when response_quality flags degenerate but the trimmed
+            # surfaced text looks clean, we can't tell if it's a false
+            # positive or trailing-degeneracy hidden by truncation. Capping
+            # at 1500 chars keeps JSON payload reasonable while showing
+            # enough of each response to verify what the detector saw.
+            "response_text": (resp_nw[:1500] if resp_nw else ""),
             # Run 41+ predictive-coding telemetry — surface what step_result
             # already carries about predictions plus a snapshot of active
             # predictions on the graph. If all 0/0/0 across all turns, the