sal-learning / docs /architecture.md
Whiteroom
Initial SAL core for HF (no plots/pdf)
2c914eb
# SAL Architecture
## Technical Deep-Dive
---
## Overview
SAL consists of four interconnected components:
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Training Loop β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β”‚
β”‚ Input β†’ Model β†’ Loss β†’ Gradients β”‚
β”‚ ↓ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Communication Layer β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Stability β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Analyzer β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ ↓ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Emergence β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Field β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β”‚ ↓ β”‚ β”‚
β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚
β”‚ β”‚ β”‚ Protection β”‚ β”‚ β”‚
β”‚ β”‚ β”‚ Masks β”‚ β”‚ β”‚
β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ ↓ β”‚
β”‚ Protected Gradients β”‚
β”‚ ↓ β”‚
β”‚ Optimizer.step() β”‚
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## Component 1: Communication Layer
The Communication Layer is the core of SAL. It sits between gradient computation and optimizer application.
### Class: `CommunicationLayer`
```python
from sal import CommunicationLayer
comm = CommunicationLayer(
model=model,
threshold=0.5, # Base stability threshold
threshold_adaptation=0.1, # How much threshold adapts
soft_protection=True, # Soft vs hard protection
history_length=100, # Steps to track
)
```
### Methods
#### `analyze() -> Dict[str, float]`
Analyzes all parameters and computes stability scores.
```python
stability_scores = comm.analyze()
# {'layer1.weight': 0.73, 'layer1.bias': 0.45, ...}
```
**Stability Score Formula:**
```
s(p) = 1 / (1 + Ξ”w Γ— g_norm)
```
Where:
- `Ξ”w` = weight change since last step
- `g_norm` = gradient magnitude
High stability = low change Γ— low gradient = parameter has settled.
#### `protect() -> Dict[str, float]`
Applies protection to gradients based on stability analysis.
```python
protection_rates = comm.protect()
# {'layer1.weight': 0.42, 'layer1.bias': 0.0, ...}
```
**Protection Formula (Soft):**
```
protected_gradient = gradient Γ— (1 - stability_score)
```
Stable parameters get reduced gradients. Volatile parameters get full gradients.
### Adaptive Threshold
The threshold adapts to training dynamics:
```
Ο„ = Ο„β‚€ + Ξ± Γ— (Οƒ_grad / ΞΌ_grad)
```
When gradients are noisy (high variance), protection increases.
When gradients are stable, protection decreases.
---
## Component 2: Stability Analyzer
Classifies parameters into the Stability Spectrum.
### Class: `StabilityAnalyzer`
```python
from sal import StabilityAnalyzer
analyzer = StabilityAnalyzer(
model=model,
protected_threshold=0.7, # Score above this β†’ protected
volatile_threshold=0.3, # Score below this β†’ volatile
history_length=50, # Steps to track
)
```
### Methods
#### `analyze() -> Dict[str, float]`
Computes stability scores using multiple signals:
1. **Weight variance** β€” Low variance over time = stable
2. **Gradient consistency** β€” Consistent direction = stable
3. **Change magnitude** β€” Small changes = stable
```python
scores = analyzer.analyze()
```
#### `classify() -> StabilitySpectrum`
Returns the distribution across stability states:
```python
spectrum = analyzer.classify()
# StabilitySpectrum(protected=12.3, neutral=70.5, volatile=17.2)
```
### Stability States
| State | Score Range | Behavior |
|-------|-------------|----------|
| Protected | > 0.7 | Minimal updates |
| Neutral | 0.3 - 0.7 | Careful updates |
| Volatile | < 0.3 | Full updates |
---
## Component 3: Emergence Field
Measures coherence, novelty, and resonance in semantic space.
### Class: `EmergenceField`
```python
from sal import EmergenceField
field = EmergenceField(
dimensions=768, # Semantic space dimensions
history_length=100, # Patterns to remember
coherence_threshold=0.6, # Minimum for emergence
novelty_threshold=0.4, # Minimum for emergence
)
```
### Methods
#### `observe(pattern) -> EmergenceState`
Observes a pattern and measures its emergence characteristics:
```python
state = field.observe(embedding)
# EmergenceState(coherence=0.72, novelty=0.45, resonance=0.63, intensity=0.41)
```
#### `detect_emergence(coherence, novelty) -> bool`
Simple check for emergence:
```python
is_emergent = field.detect_emergence(0.72, 0.45)
# True
```
### Emergence Metrics
**Coherence:** How internally consistent is the pattern?
- Measures variance between chunks
- Measures local smoothness
- High coherence = structured, meaningful
**Novelty:** How different from known patterns?
- Compares to historical patterns via cosine similarity
- High novelty = genuinely new
**Resonance:** How well does it fit the field?
- Distance from field centroid
- High resonance = harmonious with existing patterns
**Emergence = Coherent Novelty that Resonates**
---
## Component 4: Pulse-Split-Cascade (PSC)
Semantic Game of Life for pattern evolution.
### Class: `PulseCascade`
```python
from sal import PulseCascade
cascade = PulseCascade(
max_pulses=32, # Maximum concurrent pulses
max_generations=10, # Maximum depth
split_threshold=0.6, # Coherence needed to split
merge_threshold=0.8, # Similarity needed to merge
expire_threshold=0.3, # Minimum coherence to survive
)
```
### Flow
```
1. INITIATE
Prompt embedding creates root pulse
2. EVOLVE
Each pulse evolves via evolve_fn
Coherence, novelty, resonance are measured
3. SPLIT
High-coherence pulses split into children
Children have slight variations
4. MERGE
Similar pulses merge (high cosine similarity)
Merging combines embeddings and preserves best traits
5. EXPIRE
Low-coherence pulses expire
Their patterns are lost
6. EMERGE
Best viable pulse is the emergent result
No scoring β€” just natural selection
```
### Methods
#### `initiate(embedding) -> Pulse`
Start cascade from prompt:
```python
root = cascade.initiate(prompt_embedding)
```
#### `step(evolve_fn, measure_fn) -> List[Pulse]`
Advance cascade by one step:
```python
active = cascade.step(
evolve_fn=lambda x: model(x),
measure_fn=lambda x: (coherence(x), novelty(x), resonance(x)),
)
```
#### `emerge() -> Pulse`
Get the emergent result:
```python
result = cascade.emerge()
```
---
## Integration
### Minimal Integration (2 lines)
```python
# Standard training loop
output = model(input)
loss = criterion(output, target)
loss.backward()
# SAL integration
comm.analyze() # ← Line 1
comm.protect() # ← Line 2
optimizer.step()
optimizer.zero_grad()
```
### Full Integration
```python
from sal import CommunicationLayer, StabilityAnalyzer, EmergenceField
# Initialize
comm = CommunicationLayer(model)
stability = StabilityAnalyzer(model)
field = EmergenceField()
# Training loop
for epoch in range(epochs):
for batch in dataloader:
# Forward
output = model(batch)
loss = criterion(output, target)
# Backward
loss.backward()
# SAL: Analyze
comm.analyze()
stability.update()
# SAL: Observe emergence
with torch.no_grad():
state = field.observe(model.get_embedding())
# SAL: Protect
comm.protect()
# Update
optimizer.step()
optimizer.zero_grad()
# Log spectrum
spectrum = stability.classify()
print(f"Epoch {epoch}: {spectrum}")
```
---
## Configuration
### Recommended Defaults
| Parameter | Default | Description |
|-----------|---------|-------------|
| `threshold` | 0.5 | Base stability threshold |
| `threshold_adaptation` | 0.1 | Adaptation rate |
| `soft_protection` | True | Soft vs hard protection |
| `protected_threshold` | 0.7 | Score for protected state |
| `volatile_threshold` | 0.3 | Score for volatile state |
| `history_length` | 100 | Steps to track |
### Tuning Guidelines
**More Protection:** Increase `threshold`, decrease `threshold_adaptation`
**Less Protection:** Decrease `threshold`, increase `threshold_adaptation`
**Faster Adaptation:** Increase `history_length`
**More Stability:** Increase `protected_threshold`
---
## Performance
SAL adds approximately 10% computational overhead:
- Stability analysis: O(n) where n = number of parameters
- Protection application: O(n)
- Memory: O(n Γ— history_length) for tracking
This overhead is negligible compared to the benefits of reduced catastrophic forgetting and improved continual learning.
---
*For the philosophy behind these technical choices, see [Principles](principles.md).*