fix: Critique response β logit fluency, causal pruning, FHRR phase cancellation
Critique Response: Logit graft fluency, causal arena pruning, FHRR phase cancellation
Three fixes from the audio review critique.
1. Logit graft: dynamic temperature scaling replaces hard -inf (logit_bias.py)
Before: Eliminated hypothesis tokens were suppressed to -inf, which "violently collides with the model's syntactic expectations" β the LLM can't use structurally necessary tokens (pronouns, conjunctions) even when grammar demands them.
After: Suppressed tokens get -max_bias (default -8.0) instead of -inf. This makes them very unlikely but not impossible. If the LLM's autoregressive prior strongly demands a suppressed token for grammatical coherence, it can still use it β the cognitive constraint shapes the distribution without breaking fluency.
Same fix applied to StaticLogitBiasBuilder for the remote API path.
2. Causal arena: structural pruning + DAG merging (arena.py)
Before: Every proposed SCM was registered as a separate competing model, even if structurally identical to an existing one. This risked combinatorial explosion of the 250,000 counterfactual world limit.
After:
register_model()now checks for structural equivalence (same variables, same edges) before registration. Duplicate DAGs are merged by averaging their Dirichlet pseudocounts instead of adding a separate model.compete()adds an energy threshold filter: models whose single-step log-likelihood is >20 nats worse than the best are flagged for fast elimination, avoiding wasted counterfactual computation.
3. FHRR: sparse block coding + hierarchical temporal bundling (fhrr.py)
Before: bundle() added all complex phasors and normalized. With dense SBERT-grounded embeddings, bundling >20 vectors caused phase wrapping β the "superposition catastrophe" where constituent meanings wash out into noise.
After:
bundle()accepts optionaltop_kparameter: when set, only the top_k dimensions with largest magnitude are preserved per vector before addition. This induces sparsity that prevents phase wrapping.encode_sequence()uses hierarchical temporal bundling: tokens are bundled within local windows (default 16), then window summaries are bundled with sparse top_k. Short sequences still use direct bundling. This preserves high-resolution semantic detail within recent context while summarizing distant tokens.