feat: V3 — SBERT-native cognitive stack (NGC, Hopfield, falsification all in embedding space)
V3: SBERT-Native Cognitive Stack
This is the fundamental architecture fix. Every cognitive component now operates directly in SBERT embedding space instead of through random projections that destroy semantic structure.
The Root Cause (from the honest audit)
The NGC weights were random Xavier initializations operating on random projections of FHRR phasors. With [256, 128, 32] layers = ~40,000 parameters learning from 5 observations per benchmark item, the generative model was pure noise. "Falsification" measured random-projection similarity, not whether an answer explains a question. The causal arena ran 4-value categoricals with 3 observations into a 256-cell table. The Hopfield memory stored 8-dim NGC top states that couldn't distinguish semantically different inputs.
The entire cognitive stack was theater on top of SBERT.
The Fix
NGC layer 0 = SBERT embedding dimension (384 for MiniLM-L6-v2).
UnifiedField.text_to_obs()goes directly tosbert.encode(), bypassing FHRR→block-averageUnifiedField.get_sbert_embedding()provides raw SBERT embeddings to any caller- NGC settles on real SBERT embeddings — semantically similar inputs produce similar layer-0 activations
- Hopfield stores SBERT embeddings (384-dim) not NGC top states (8-dim)
- Falsification compares NGC predictions against SBERT prompt embeddings through W matrices that learn from every item
What Changes For Each Component
NGC Falsification (canonical.py _ngc_falsification_scores):
- Before: encode choice via FHRR→random-project→settle→measure PE against random-projected prompt
- After: get SBERT embedding of
prompt + choice→settle→NGC predicts what prompt embedding should look like→PE = ||prediction - sbert(prompt)||² - This is real hypothesis testing: "given that the answer is X, does my generative model reconstruct the question?"
Hopfield Memory (unified_field.py):
- Before: stores 8-dim NGC top-layer states (low information)
- After: stores 384-dim SBERT embeddings (full semantic content)
- Memory retrieval actually distinguishes "I've seen a question like this before"
Memory Scoring (canonical.py _memory_choice_scores):
- Before: episodic memory with FHRR-based context vectors
- After: Hopfield retrieval in SBERT space — query with prompt embedding, score choices by similarity to retrieved memory
Feedback Learning (canonical.py learn_from_feedback):
- Before: store FHRR-projected abstract states
- After: store SBERT embedding of correct
prompt + answer— future items in the same domain benefit from this real semantic memory
What's Preserved
- FHRR encoder (for compositional binding — parallel path, not in NGC observation chain)
- NGC settling dynamics (Ororbia equations unchanged)
- Friston log-precision updates
- Adaptive settling
- Dirichlet channel reliability
- All benchmark harness code
- Causal arena (still operates on bucketed NGC-derived observations — next fix)
Why This Should Work
The NGC now learns from every benchmark item's SBERT embedding. After 50+ items:
- W matrices encode real structure: "science questions have answer embeddings that relate to question embeddings in pattern X"
- Falsification becomes genuine: "does this choice's settled state reconstruct the question through learned weights, or does it predict something different?"
- Memory becomes useful: "the last 10 science questions had correct answers with these embedding patterns"
This is the capability the LLM doesn't have — cross-item structural knowledge accumulated by a persistent generative model that explicitly tests hypotheses.