sal-learning / docs /architecture.md
Whiteroom
Initial SAL core for HF (no plots/pdf)
2c914eb

SAL Architecture

Technical Deep-Dive


Overview

SAL consists of four interconnected components:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Training Loop                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚   Input β†’ Model β†’ Loss β†’ Gradients                         β”‚
β”‚                              ↓                              β”‚
β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚                 β”‚  Communication Layer   β”‚                  β”‚
β”‚                 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                  β”‚
β”‚                 β”‚  β”‚ Stability        β”‚  β”‚                  β”‚
β”‚                 β”‚  β”‚ Analyzer         β”‚  β”‚                  β”‚
β”‚                 β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                  β”‚
β”‚                 β”‚           ↓            β”‚                  β”‚
β”‚                 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                  β”‚
β”‚                 β”‚  β”‚ Emergence        β”‚  β”‚                  β”‚
β”‚                 β”‚  β”‚ Field            β”‚  β”‚                  β”‚
β”‚                 β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                  β”‚
β”‚                 β”‚           ↓            β”‚                  β”‚
β”‚                 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚                  β”‚
β”‚                 β”‚  β”‚ Protection       β”‚  β”‚                  β”‚
β”‚                 β”‚  β”‚ Masks            β”‚  β”‚                  β”‚
β”‚                 β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚                  β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚                              ↓                              β”‚
β”‚                    Protected Gradients                      β”‚
β”‚                              ↓                              β”‚
β”‚                       Optimizer.step()                      β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component 1: Communication Layer

The Communication Layer is the core of SAL. It sits between gradient computation and optimizer application.

Class: CommunicationLayer

from sal import CommunicationLayer

comm = CommunicationLayer(
    model=model,
    threshold=0.5,           # Base stability threshold
    threshold_adaptation=0.1, # How much threshold adapts
    soft_protection=True,     # Soft vs hard protection
    history_length=100,       # Steps to track
)

Methods

analyze() -> Dict[str, float]

Analyzes all parameters and computes stability scores.

stability_scores = comm.analyze()
# {'layer1.weight': 0.73, 'layer1.bias': 0.45, ...}

Stability Score Formula:

s(p) = 1 / (1 + Ξ”w Γ— g_norm)

Where:

  • Ξ”w = weight change since last step
  • g_norm = gradient magnitude

High stability = low change Γ— low gradient = parameter has settled.

protect() -> Dict[str, float]

Applies protection to gradients based on stability analysis.

protection_rates = comm.protect()
# {'layer1.weight': 0.42, 'layer1.bias': 0.0, ...}

Protection Formula (Soft):

protected_gradient = gradient Γ— (1 - stability_score)

Stable parameters get reduced gradients. Volatile parameters get full gradients.

Adaptive Threshold

The threshold adapts to training dynamics:

Ο„ = Ο„β‚€ + Ξ± Γ— (Οƒ_grad / ΞΌ_grad)

When gradients are noisy (high variance), protection increases. When gradients are stable, protection decreases.


Component 2: Stability Analyzer

Classifies parameters into the Stability Spectrum.

Class: StabilityAnalyzer

from sal import StabilityAnalyzer

analyzer = StabilityAnalyzer(
    model=model,
    protected_threshold=0.7,  # Score above this β†’ protected
    volatile_threshold=0.3,   # Score below this β†’ volatile
    history_length=50,        # Steps to track
)

Methods

analyze() -> Dict[str, float]

Computes stability scores using multiple signals:

  1. Weight variance β€” Low variance over time = stable
  2. Gradient consistency β€” Consistent direction = stable
  3. Change magnitude β€” Small changes = stable
scores = analyzer.analyze()

classify() -> StabilitySpectrum

Returns the distribution across stability states:

spectrum = analyzer.classify()
# StabilitySpectrum(protected=12.3, neutral=70.5, volatile=17.2)

Stability States

State Score Range Behavior
Protected > 0.7 Minimal updates
Neutral 0.3 - 0.7 Careful updates
Volatile < 0.3 Full updates

Component 3: Emergence Field

Measures coherence, novelty, and resonance in semantic space.

Class: EmergenceField

from sal import EmergenceField

field = EmergenceField(
    dimensions=768,           # Semantic space dimensions
    history_length=100,       # Patterns to remember
    coherence_threshold=0.6,  # Minimum for emergence
    novelty_threshold=0.4,    # Minimum for emergence
)

Methods

observe(pattern) -> EmergenceState

Observes a pattern and measures its emergence characteristics:

state = field.observe(embedding)
# EmergenceState(coherence=0.72, novelty=0.45, resonance=0.63, intensity=0.41)

detect_emergence(coherence, novelty) -> bool

Simple check for emergence:

is_emergent = field.detect_emergence(0.72, 0.45)
# True

Emergence Metrics

Coherence: How internally consistent is the pattern?

  • Measures variance between chunks
  • Measures local smoothness
  • High coherence = structured, meaningful

Novelty: How different from known patterns?

  • Compares to historical patterns via cosine similarity
  • High novelty = genuinely new

Resonance: How well does it fit the field?

  • Distance from field centroid
  • High resonance = harmonious with existing patterns

Emergence = Coherent Novelty that Resonates


Component 4: Pulse-Split-Cascade (PSC)

Semantic Game of Life for pattern evolution.

Class: PulseCascade

from sal import PulseCascade

cascade = PulseCascade(
    max_pulses=32,          # Maximum concurrent pulses
    max_generations=10,     # Maximum depth
    split_threshold=0.6,    # Coherence needed to split
    merge_threshold=0.8,    # Similarity needed to merge
    expire_threshold=0.3,   # Minimum coherence to survive
)

Flow

1. INITIATE
   Prompt embedding creates root pulse
   
2. EVOLVE
   Each pulse evolves via evolve_fn
   Coherence, novelty, resonance are measured
   
3. SPLIT
   High-coherence pulses split into children
   Children have slight variations
   
4. MERGE
   Similar pulses merge (high cosine similarity)
   Merging combines embeddings and preserves best traits
   
5. EXPIRE
   Low-coherence pulses expire
   Their patterns are lost
   
6. EMERGE
   Best viable pulse is the emergent result
   No scoring β€” just natural selection

Methods

initiate(embedding) -> Pulse

Start cascade from prompt:

root = cascade.initiate(prompt_embedding)

step(evolve_fn, measure_fn) -> List[Pulse]

Advance cascade by one step:

active = cascade.step(
    evolve_fn=lambda x: model(x),
    measure_fn=lambda x: (coherence(x), novelty(x), resonance(x)),
)

emerge() -> Pulse

Get the emergent result:

result = cascade.emerge()

Integration

Minimal Integration (2 lines)

# Standard training loop
output = model(input)
loss = criterion(output, target)
loss.backward()

# SAL integration
comm.analyze()   # ← Line 1
comm.protect()   # ← Line 2

optimizer.step()
optimizer.zero_grad()

Full Integration

from sal import CommunicationLayer, StabilityAnalyzer, EmergenceField

# Initialize
comm = CommunicationLayer(model)
stability = StabilityAnalyzer(model)
field = EmergenceField()

# Training loop
for epoch in range(epochs):
    for batch in dataloader:
        # Forward
        output = model(batch)
        loss = criterion(output, target)
        
        # Backward
        loss.backward()
        
        # SAL: Analyze
        comm.analyze()
        stability.update()
        
        # SAL: Observe emergence
        with torch.no_grad():
            state = field.observe(model.get_embedding())
        
        # SAL: Protect
        comm.protect()
        
        # Update
        optimizer.step()
        optimizer.zero_grad()
        
    # Log spectrum
    spectrum = stability.classify()
    print(f"Epoch {epoch}: {spectrum}")

Configuration

Recommended Defaults

Parameter Default Description
threshold 0.5 Base stability threshold
threshold_adaptation 0.1 Adaptation rate
soft_protection True Soft vs hard protection
protected_threshold 0.7 Score for protected state
volatile_threshold 0.3 Score for volatile state
history_length 100 Steps to track

Tuning Guidelines

More Protection: Increase threshold, decrease threshold_adaptation Less Protection: Decrease threshold, increase threshold_adaptation Faster Adaptation: Increase history_length More Stability: Increase protected_threshold


Performance

SAL adds approximately 10% computational overhead:

  • Stability analysis: O(n) where n = number of parameters
  • Protection application: O(n)
  • Memory: O(n Γ— history_length) for tracking

This overhead is negligible compared to the benefits of reduced catastrophic forgetting and improved continual learning.


For the philosophy behind these technical choices, see Principles.