sal-learning / docs /architecture.md

Whiteroom

Initial SAL core for HF (no plots/pdf)

2c914eb 2 months ago

10.7 kB

	# SAL Architecture

	## Technical Deep-Dive

	---

	## Overview

	SAL consists of four interconnected components:

	```
	┌─────────────────────────────────────────────────────────────┐
	│ Training Loop │
	├─────────────────────────────────────────────────────────────┤
	│ │
	│ Input → Model → Loss → Gradients │
	│ ↓ │
	│ ┌────────────────────────┐ │
	│ │ Communication Layer │ │
	│ │ ┌──────────────────┐ │ │
	│ │ │ Stability │ │ │
	│ │ │ Analyzer │ │ │
	│ │ └────────┬─────────┘ │ │
	│ │ ↓ │ │
	│ │ ┌──────────────────┐ │ │
	│ │ │ Emergence │ │ │
	│ │ │ Field │ │ │
	│ │ └────────┬─────────┘ │ │
	│ │ ↓ │ │
	│ │ ┌──────────────────┐ │ │
	│ │ │ Protection │ │ │
	│ │ │ Masks │ │ │
	│ │ └──────────────────┘ │ │
	│ └────────────┬───────────┘ │
	│ ↓ │
	│ Protected Gradients │
	│ ↓ │
	│ Optimizer.step() │
	│ │
	└─────────────────────────────────────────────────────────────┘
	```

	---

	## Component 1: Communication Layer

	The Communication Layer is the core of SAL. It sits between gradient computation and optimizer application.

	### Class: `CommunicationLayer`

	```python
	from sal import CommunicationLayer

	comm = CommunicationLayer(
	model=model,
	threshold=0.5, # Base stability threshold
	threshold_adaptation=0.1, # How much threshold adapts
	soft_protection=True, # Soft vs hard protection
	history_length=100, # Steps to track
	)
	```

	### Methods

	#### `analyze() -> Dict[str, float]`

	Analyzes all parameters and computes stability scores.

	```python
	stability_scores = comm.analyze()
	# {'layer1.weight': 0.73, 'layer1.bias': 0.45, ...}
	```

	Stability Score Formula:

	```
	s(p) = 1 / (1 + Δw × g_norm)
	```

	Where:
	- `Δw` = weight change since last step
	- `g_norm` = gradient magnitude

	High stability = low change × low gradient = parameter has settled.

	#### `protect() -> Dict[str, float]`

	Applies protection to gradients based on stability analysis.

	```python
	protection_rates = comm.protect()
	# {'layer1.weight': 0.42, 'layer1.bias': 0.0, ...}
	```

	Protection Formula (Soft):

	```
	protected_gradient = gradient × (1 - stability_score)
	```

	Stable parameters get reduced gradients. Volatile parameters get full gradients.

	### Adaptive Threshold

	The threshold adapts to training dynamics:

	```
	τ = τ₀ + α × (σ_grad / μ_grad)
	```

	When gradients are noisy (high variance), protection increases.
	When gradients are stable, protection decreases.

	---

	## Component 2: Stability Analyzer

	Classifies parameters into the Stability Spectrum.

	### Class: `StabilityAnalyzer`

	```python
	from sal import StabilityAnalyzer

	analyzer = StabilityAnalyzer(
	model=model,
	protected_threshold=0.7, # Score above this → protected
	volatile_threshold=0.3, # Score below this → volatile
	history_length=50, # Steps to track
	)
	```

	### Methods

	#### `analyze() -> Dict[str, float]`

	Computes stability scores using multiple signals:

	1. Weight variance — Low variance over time = stable
	2. Gradient consistency — Consistent direction = stable
	3. Change magnitude — Small changes = stable

	```python
	scores = analyzer.analyze()
	```

	#### `classify() -> StabilitySpectrum`

	Returns the distribution across stability states:

	```python
	spectrum = analyzer.classify()
	# StabilitySpectrum(protected=12.3, neutral=70.5, volatile=17.2)
	```

	### Stability States

	\| State \| Score Range \| Behavior \|
	\|-------\|-------------\|----------\|
	\| Protected \| > 0.7 \| Minimal updates \|
	\| Neutral \| 0.3 - 0.7 \| Careful updates \|
	\| Volatile \| < 0.3 \| Full updates \|

	---

	## Component 3: Emergence Field

	Measures coherence, novelty, and resonance in semantic space.

	### Class: `EmergenceField`

	```python
	from sal import EmergenceField

	field = EmergenceField(
	dimensions=768, # Semantic space dimensions
	history_length=100, # Patterns to remember
	coherence_threshold=0.6, # Minimum for emergence
	novelty_threshold=0.4, # Minimum for emergence
	)
	```

	### Methods

	#### `observe(pattern) -> EmergenceState`

	Observes a pattern and measures its emergence characteristics:

	```python
	state = field.observe(embedding)
	# EmergenceState(coherence=0.72, novelty=0.45, resonance=0.63, intensity=0.41)
	```

	#### `detect_emergence(coherence, novelty) -> bool`

	Simple check for emergence:

	```python
	is_emergent = field.detect_emergence(0.72, 0.45)
	# True
	```

	### Emergence Metrics

	Coherence: How internally consistent is the pattern?
	- Measures variance between chunks
	- Measures local smoothness
	- High coherence = structured, meaningful

	Novelty: How different from known patterns?
	- Compares to historical patterns via cosine similarity
	- High novelty = genuinely new

	Resonance: How well does it fit the field?
	- Distance from field centroid
	- High resonance = harmonious with existing patterns

	Emergence = Coherent Novelty that Resonates

	---

	## Component 4: Pulse-Split-Cascade (PSC)

	Semantic Game of Life for pattern evolution.

	### Class: `PulseCascade`

	```python
	from sal import PulseCascade

	cascade = PulseCascade(
	max_pulses=32, # Maximum concurrent pulses
	max_generations=10, # Maximum depth
	split_threshold=0.6, # Coherence needed to split
	merge_threshold=0.8, # Similarity needed to merge
	expire_threshold=0.3, # Minimum coherence to survive
	)
	```

	### Flow

	```
	1. INITIATE
	Prompt embedding creates root pulse

	2. EVOLVE
	Each pulse evolves via evolve_fn
	Coherence, novelty, resonance are measured

	3. SPLIT
	High-coherence pulses split into children
	Children have slight variations

	4. MERGE
	Similar pulses merge (high cosine similarity)
	Merging combines embeddings and preserves best traits

	5. EXPIRE
	Low-coherence pulses expire
	Their patterns are lost

	6. EMERGE
	Best viable pulse is the emergent result
	No scoring — just natural selection
	```

	### Methods

	#### `initiate(embedding) -> Pulse`

	Start cascade from prompt:

	```python
	root = cascade.initiate(prompt_embedding)
	```

	#### `step(evolve_fn, measure_fn) -> List[Pulse]`

	Advance cascade by one step:

	```python
	active = cascade.step(
	evolve_fn=lambda x: model(x),
	measure_fn=lambda x: (coherence(x), novelty(x), resonance(x)),
	)
	```

	#### `emerge() -> Pulse`

	Get the emergent result:

	```python
	result = cascade.emerge()
	```

	---

	## Integration

	### Minimal Integration (2 lines)

	```python
	# Standard training loop
	output = model(input)
	loss = criterion(output, target)
	loss.backward()

	# SAL integration
	comm.analyze() # ← Line 1
	comm.protect() # ← Line 2

	optimizer.step()
	optimizer.zero_grad()
	```

	### Full Integration

	```python
	from sal import CommunicationLayer, StabilityAnalyzer, EmergenceField

	# Initialize
	comm = CommunicationLayer(model)
	stability = StabilityAnalyzer(model)
	field = EmergenceField()

	# Training loop
	for epoch in range(epochs):
	for batch in dataloader:
	# Forward
	output = model(batch)
	loss = criterion(output, target)

	# Backward
	loss.backward()

	# SAL: Analyze
	comm.analyze()
	stability.update()

	# SAL: Observe emergence
	with torch.no_grad():
	state = field.observe(model.get_embedding())

	# SAL: Protect
	comm.protect()

	# Update
	optimizer.step()
	optimizer.zero_grad()

	# Log spectrum
	spectrum = stability.classify()
	print(f"Epoch {epoch}: {spectrum}")
	```

	---

	## Configuration

	### Recommended Defaults

	\| Parameter \| Default \| Description \|
	\|-----------\|---------\|-------------\|
	\| `threshold` \| 0.5 \| Base stability threshold \|
	\| `threshold_adaptation` \| 0.1 \| Adaptation rate \|
	\| `soft_protection` \| True \| Soft vs hard protection \|
	\| `protected_threshold` \| 0.7 \| Score for protected state \|
	\| `volatile_threshold` \| 0.3 \| Score for volatile state \|
	\| `history_length` \| 100 \| Steps to track \|

	### Tuning Guidelines

	More Protection: Increase `threshold`, decrease `threshold_adaptation`
	Less Protection: Decrease `threshold`, increase `threshold_adaptation`
	Faster Adaptation: Increase `history_length`
	More Stability: Increase `protected_threshold`

	---

	## Performance

	SAL adds approximately 10% computational overhead:
	- Stability analysis: O(n) where n = number of parameters
	- Protection application: O(n)
	- Memory: O(n × history_length) for tracking

	This overhead is negligible compared to the benefits of reduced catastrophic forgetting and improved continual learning.

	---

	For the philosophy behind these technical choices, see [Principles](principles.md).