fdra-half-life-regularization / INTEGRATION_README.md
juddddd's picture
Upload INTEGRATION_README.md with huggingface_hub
36b9a41 verified

FDRA Transformer Integration Package

Version: 1.0
Date: 2026-01-22
Authors: Fractal AGI Team


Overview

This package provides a complete solution for integrating FDRA oscillator memory into transformer architectures to solve the long-context forgetting problem.

Problem Solved

  • Original Issue: FDRA models experience τ collapse during training, causing failure on long-context tasks despite good short-context performance.
  • Solution: Four integrated fixes that achieve 100% accuracy through K=4096 (full context) with structured interference.

Files Included

File Description
fdra_production.py NumPy production module (validated)
fdra_transformer_integration.py PyTorch integration for transformers
fdra_oscillators.py Core oscillator bank implementation
half_life_regularizer.py Regularization loss module
COMPLETE_SOLUTION.md Implementation guide
INTEGRATION_README.md This file

Quick Start

1. Add FDRA to Your Transformer

from fdra_transformer_integration import FDRAConfig, FDRATransformerBlock, HalfLifeRegularizerLoss

# Configure FDRA
config = FDRAConfig(
    num_oscillators=64,
    d_model=512,  # Match your transformer
    sequence_length=4096,
    tau_max_multiplier=4.0,  # FIX 1: Extended τ
    routing_mode="tau_weighted",  # FIX 2: τ-weighted routing
    use_redundant_encoding=True,  # FIX 4: Redundant encoding
)

# Replace transformer blocks
block = FDRATransformerBlock(
    d_model=512,
    n_heads=8,
    d_ff=2048,
    fdra_config=config
)

2. Add Regularizer to Training

regularizer = HalfLifeRegularizerLoss(config)

# In training loop:
for batch in dataloader:
    output = model(batch.input)
    
    task_loss = criterion(output, batch.target)
    
    # Add FDRA regularization (FIX 3: Half-life incentives)
    reg_loss, metrics = regularizer(model.block.attn.fdra)
    
    total_loss = task_loss + reg_loss
    total_loss.backward()
    optimizer.step()

3. Mark Identity-Critical Information

# For identity encoding (facts, important context):
output = block(x, is_identity=True)  # Uses τ-weighted routing

# For regular context (noise, interference):
output = block(x, is_identity=False)  # Uses uniform routing

The Four Fixes

Fix 1: Extended τ Range (4×L)

tau_max_multiplier=4.0  # τ_max = 16384 for L=4096

Ensures oscillators have sufficient capacity to retain information across full context.

Fix 2: τ-Weighted Routing

routing_mode="tau_weighted"

Identity information is preferentially written to slow (high-τ) oscillators where it persists longer.

Fix 3: Half-Life Incentives

HalfLifeRegularizerLoss(config)

Prevents τ collapse during training by enforcing:

  • Log-uniform moment matching
  • Long-tail existence constraint
  • Hard constraint (25% of oscillators in long-tail)

Fix 4: Redundant Encoding

use_redundant_encoding=True
redundancy_copies=3

Encodes critical information 3× with random orthogonal rotations. Voting at readout provides robustness to structured interference.


Validation Results

K (interference tokens) Accuracy
0 100%
256 100%
512 100%
1024 100%
2048 100%
4096 100%
8192 100%

Test: Identity patterns encoded, K tokens of low-rank AR(1) interference, query recovery.


Integration Checklist

  • Replace TransformerBlock with FDRATransformerBlock
  • Add HalfLifeRegularizerLoss to training loss
  • Set is_identity=True for important context
  • Call model.reset_memory(batch_size) between sequences
  • Monitor metrics['tau_min'], metrics['tau_max'], metrics['slow_frac']

Monitoring

During training, monitor these metrics:

reg_loss, metrics = regularizer(model.block.attn.fdra)

print(f"τ range: [{metrics['tau_min']:.0f}, {metrics['tau_max']:.0f}]")
print(f"Slow fraction: {metrics['slow_frac']:.2%}")  # Should be ~25%
print(f"Reg loss: {reg_loss.item():.6f}")

Healthy values:

  • tau_max ≈ 4 × sequence_length
  • slow_frac ≈ 25%
  • reg_loss decreasing during training

Troubleshooting

τ collapse (all τ → 1)

  • Increase reg_weight (try 0.2 or 0.3)
  • Check that regularizer gradients are flowing

Poor long-context accuracy

  • Verify is_identity=True for important info
  • Increase redundancy_copies to 4 or 5
  • Increase tau_max_multiplier to 8.0

Slow training

  • Reduce num_oscillators (try 32)
  • Use gradient checkpointing for FDRA module

Citation

If you use this work, please cite:

@software{fdra_long_context_2026,
  title={FDRA Long-Context Solution: Half-Life Regularization and τ-Weighted Routing},
  author={Fractal AGI Team},
  year={2026},
  url={https://huggingface.co/fractal-agi/fdra-half-life-regularization}
}

The architecture works. The memory bottleneck is solved.