fractal-agi
/

fdra-half-life-regularization

Model card Files Files and versions

xet

Community

juddddd commited on Jan 22

Commit

88dae09

verified ·

1 Parent(s): 985a262

Upload training_validation/FINAL_VERDICT.md with huggingface_hub

Browse files

Files changed (1) hide show

training_validation/FINAL_VERDICT.md +121 -0

training_validation/FINAL_VERDICT.md ADDED Viewed

	@@ -0,0 +1,121 @@

+# Combined Routing + Regularizer: Final Verdict
+**Date:** 2026-01-22
+## Executive Summary
+The combined approach (τ-weighted routing + hard-constraint regularizer) **WORKS**, but has a ceiling effect at full context length.
+| Condition | K=256 Accuracy | K=1024 Accuracy | K=4096 Accuracy |
+|-----------|----------------|-----------------|-----------------|
+| A) Baseline | 0% | 0% | 0% |
+| B) Routing only | 0% | 0% | 0% |
+| C) Regularizer only | 20% | 0% | 0% |
+| **D) Combined** | **60%** | **20%** | 0% |
+---
+## Key Findings
+### 1. Routing Alone is Insufficient
+Without regularizer, τ collapses to ~6 and QA fails completely.
+### 2. Regularizer Alone is Insufficient
+Preserves τ distribution but uniform routing wastes identity in fast modes.
+### 3. Combined Approach Works for Medium Context
+- 60% accuracy at K=256 (vs 0-20% for others)
+- 20% accuracy at K=1024 (vs 0% for others)
+- Retention curve significantly better
+### 4. Full Context (K=4096) Remains Challenging
+Even combined approach fails at K=4096 because:
+- Anchored-tail has τ ∈ [3072, 5120]
+- At K=4096, even τ=5120 gives only 58% retention
+- Noise accumulation pushes below threshold
+---
+## Critical Bug Discovery
+During development, we discovered that **routing must be SELECTIVE**:
+```
+WRONG: Route ALL inputs with τ-weighting (including interference)
+RIGHT: Route IDENTITY with τ-weighting, INTERFERENCE uniform
+```
+When both encoding and interference are τ-weighted, noise preferentially accumulates in slow modes, destroying the benefit.
+The correct architecture:
+- **Identity/invariants**: τ-weighted write to slow modes
+- **Regular token stream**: Uniform write (or even inverse-τ-weighted)
+---
+## Implications for Real Model
+### What This Proves
+1. **τ-routing is a valid mechanism** for improving retention
+2. **Regularizer is necessary** to prevent collapse during training
+3. **Selective routing** is critical (identity vs content)
+4. **τ >> L is needed** for full-context preservation
+### Recommended Changes for Sefer
+1. **Add hard-constraint regularizer** during training
+   - Force 25%+ oscillators with τ ≥ 0.75*L
+2. **Implement selective routing**
+   - Identify identity-bearing signals (via content type or position)
+   - Route identity to slow oscillators
+   - Route content uniformly
+3. **Consider τ_max > L**
+   - For full-context preservation, need τ ≈ 2*L
+   - This ensures 70%+ retention at K=L
+4. **Add auxiliary loss**
+   - Encourage identity information in slow state
+   - Discourage task-irrelevant content in slow state
+---
+## Technical Details
+### Training Simulation
+- 500 steps with collapse pressure (rate=0.01, target=5)
+- Hard constraint: 25% oscillators in [0.75*L, 1.25*L]
+- Checkpoint statistics at [0, 50, 100, 200, 300, 400, 500]
+### QA Evaluation
+- K values: [0, 256, 512, 1024, 2048, 4096]
+- Low-rank interference (rank 4, AR coefficient 0.9)
+- τ-weighted readout for slow state
+- Threshold: retention ≥ 50% = correct
+### Distribution Parameters
+- Anchored-tail: 25% with τ ∈ [3072, 5120]
+- Short-tail: 75% with τ ∈ [1, 512]
+---
+## Conclusion
+> **Does τ-routing mitigate half-life collapse and improve long-context binding?**
+**PARTIAL YES:**
+- Prevents collapse when combined with regularizer ✓
+- Improves medium-context binding (K ≤ 1024) ✓
+- Does NOT fully solve full-context (K = L) ✗
+**The path forward:**
+1. Combined routing + regularization (implemented)
+2. Selective routing (identity vs content)
+3. Increased τ_max (τ ≈ 2*L for full coverage)
+4. Auxiliary loss for slow-mode identity binding
+---
+*Final verdict generated 2026-01-22*