| # SAL Architecture | |
| ## Technical Deep-Dive | |
| --- | |
| ## Overview | |
| SAL consists of four interconnected components: | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Training Loop β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β Input β Model β Loss β Gradients β | |
| β β β | |
| β ββββββββββββββββββββββββββ β | |
| β β Communication Layer β β | |
| β β ββββββββββββββββββββ β β | |
| β β β Stability β β β | |
| β β β Analyzer β β β | |
| β β ββββββββββ¬ββββββββββ β β | |
| β β β β β | |
| β β ββββββββββββββββββββ β β | |
| β β β Emergence β β β | |
| β β β Field β β β | |
| β β ββββββββββ¬ββββββββββ β β | |
| β β β β β | |
| β β ββββββββββββββββββββ β β | |
| β β β Protection β β β | |
| β β β Masks β β β | |
| β β ββββββββββββββββββββ β β | |
| β ββββββββββββββ¬ββββββββββββ β | |
| β β β | |
| β Protected Gradients β | |
| β β β | |
| β Optimizer.step() β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Component 1: Communication Layer | |
| The Communication Layer is the core of SAL. It sits between gradient computation and optimizer application. | |
| ### Class: `CommunicationLayer` | |
| ```python | |
| from sal import CommunicationLayer | |
| comm = CommunicationLayer( | |
| model=model, | |
| threshold=0.5, # Base stability threshold | |
| threshold_adaptation=0.1, # How much threshold adapts | |
| soft_protection=True, # Soft vs hard protection | |
| history_length=100, # Steps to track | |
| ) | |
| ``` | |
| ### Methods | |
| #### `analyze() -> Dict[str, float]` | |
| Analyzes all parameters and computes stability scores. | |
| ```python | |
| stability_scores = comm.analyze() | |
| # {'layer1.weight': 0.73, 'layer1.bias': 0.45, ...} | |
| ``` | |
| **Stability Score Formula:** | |
| ``` | |
| s(p) = 1 / (1 + Ξw Γ g_norm) | |
| ``` | |
| Where: | |
| - `Ξw` = weight change since last step | |
| - `g_norm` = gradient magnitude | |
| High stability = low change Γ low gradient = parameter has settled. | |
| #### `protect() -> Dict[str, float]` | |
| Applies protection to gradients based on stability analysis. | |
| ```python | |
| protection_rates = comm.protect() | |
| # {'layer1.weight': 0.42, 'layer1.bias': 0.0, ...} | |
| ``` | |
| **Protection Formula (Soft):** | |
| ``` | |
| protected_gradient = gradient Γ (1 - stability_score) | |
| ``` | |
| Stable parameters get reduced gradients. Volatile parameters get full gradients. | |
| ### Adaptive Threshold | |
| The threshold adapts to training dynamics: | |
| ``` | |
| Ο = Οβ + Ξ± Γ (Ο_grad / ΞΌ_grad) | |
| ``` | |
| When gradients are noisy (high variance), protection increases. | |
| When gradients are stable, protection decreases. | |
| --- | |
| ## Component 2: Stability Analyzer | |
| Classifies parameters into the Stability Spectrum. | |
| ### Class: `StabilityAnalyzer` | |
| ```python | |
| from sal import StabilityAnalyzer | |
| analyzer = StabilityAnalyzer( | |
| model=model, | |
| protected_threshold=0.7, # Score above this β protected | |
| volatile_threshold=0.3, # Score below this β volatile | |
| history_length=50, # Steps to track | |
| ) | |
| ``` | |
| ### Methods | |
| #### `analyze() -> Dict[str, float]` | |
| Computes stability scores using multiple signals: | |
| 1. **Weight variance** β Low variance over time = stable | |
| 2. **Gradient consistency** β Consistent direction = stable | |
| 3. **Change magnitude** β Small changes = stable | |
| ```python | |
| scores = analyzer.analyze() | |
| ``` | |
| #### `classify() -> StabilitySpectrum` | |
| Returns the distribution across stability states: | |
| ```python | |
| spectrum = analyzer.classify() | |
| # StabilitySpectrum(protected=12.3, neutral=70.5, volatile=17.2) | |
| ``` | |
| ### Stability States | |
| | State | Score Range | Behavior | | |
| |-------|-------------|----------| | |
| | Protected | > 0.7 | Minimal updates | | |
| | Neutral | 0.3 - 0.7 | Careful updates | | |
| | Volatile | < 0.3 | Full updates | | |
| --- | |
| ## Component 3: Emergence Field | |
| Measures coherence, novelty, and resonance in semantic space. | |
| ### Class: `EmergenceField` | |
| ```python | |
| from sal import EmergenceField | |
| field = EmergenceField( | |
| dimensions=768, # Semantic space dimensions | |
| history_length=100, # Patterns to remember | |
| coherence_threshold=0.6, # Minimum for emergence | |
| novelty_threshold=0.4, # Minimum for emergence | |
| ) | |
| ``` | |
| ### Methods | |
| #### `observe(pattern) -> EmergenceState` | |
| Observes a pattern and measures its emergence characteristics: | |
| ```python | |
| state = field.observe(embedding) | |
| # EmergenceState(coherence=0.72, novelty=0.45, resonance=0.63, intensity=0.41) | |
| ``` | |
| #### `detect_emergence(coherence, novelty) -> bool` | |
| Simple check for emergence: | |
| ```python | |
| is_emergent = field.detect_emergence(0.72, 0.45) | |
| # True | |
| ``` | |
| ### Emergence Metrics | |
| **Coherence:** How internally consistent is the pattern? | |
| - Measures variance between chunks | |
| - Measures local smoothness | |
| - High coherence = structured, meaningful | |
| **Novelty:** How different from known patterns? | |
| - Compares to historical patterns via cosine similarity | |
| - High novelty = genuinely new | |
| **Resonance:** How well does it fit the field? | |
| - Distance from field centroid | |
| - High resonance = harmonious with existing patterns | |
| **Emergence = Coherent Novelty that Resonates** | |
| --- | |
| ## Component 4: Pulse-Split-Cascade (PSC) | |
| Semantic Game of Life for pattern evolution. | |
| ### Class: `PulseCascade` | |
| ```python | |
| from sal import PulseCascade | |
| cascade = PulseCascade( | |
| max_pulses=32, # Maximum concurrent pulses | |
| max_generations=10, # Maximum depth | |
| split_threshold=0.6, # Coherence needed to split | |
| merge_threshold=0.8, # Similarity needed to merge | |
| expire_threshold=0.3, # Minimum coherence to survive | |
| ) | |
| ``` | |
| ### Flow | |
| ``` | |
| 1. INITIATE | |
| Prompt embedding creates root pulse | |
| 2. EVOLVE | |
| Each pulse evolves via evolve_fn | |
| Coherence, novelty, resonance are measured | |
| 3. SPLIT | |
| High-coherence pulses split into children | |
| Children have slight variations | |
| 4. MERGE | |
| Similar pulses merge (high cosine similarity) | |
| Merging combines embeddings and preserves best traits | |
| 5. EXPIRE | |
| Low-coherence pulses expire | |
| Their patterns are lost | |
| 6. EMERGE | |
| Best viable pulse is the emergent result | |
| No scoring β just natural selection | |
| ``` | |
| ### Methods | |
| #### `initiate(embedding) -> Pulse` | |
| Start cascade from prompt: | |
| ```python | |
| root = cascade.initiate(prompt_embedding) | |
| ``` | |
| #### `step(evolve_fn, measure_fn) -> List[Pulse]` | |
| Advance cascade by one step: | |
| ```python | |
| active = cascade.step( | |
| evolve_fn=lambda x: model(x), | |
| measure_fn=lambda x: (coherence(x), novelty(x), resonance(x)), | |
| ) | |
| ``` | |
| #### `emerge() -> Pulse` | |
| Get the emergent result: | |
| ```python | |
| result = cascade.emerge() | |
| ``` | |
| --- | |
| ## Integration | |
| ### Minimal Integration (2 lines) | |
| ```python | |
| # Standard training loop | |
| output = model(input) | |
| loss = criterion(output, target) | |
| loss.backward() | |
| # SAL integration | |
| comm.analyze() # β Line 1 | |
| comm.protect() # β Line 2 | |
| optimizer.step() | |
| optimizer.zero_grad() | |
| ``` | |
| ### Full Integration | |
| ```python | |
| from sal import CommunicationLayer, StabilityAnalyzer, EmergenceField | |
| # Initialize | |
| comm = CommunicationLayer(model) | |
| stability = StabilityAnalyzer(model) | |
| field = EmergenceField() | |
| # Training loop | |
| for epoch in range(epochs): | |
| for batch in dataloader: | |
| # Forward | |
| output = model(batch) | |
| loss = criterion(output, target) | |
| # Backward | |
| loss.backward() | |
| # SAL: Analyze | |
| comm.analyze() | |
| stability.update() | |
| # SAL: Observe emergence | |
| with torch.no_grad(): | |
| state = field.observe(model.get_embedding()) | |
| # SAL: Protect | |
| comm.protect() | |
| # Update | |
| optimizer.step() | |
| optimizer.zero_grad() | |
| # Log spectrum | |
| spectrum = stability.classify() | |
| print(f"Epoch {epoch}: {spectrum}") | |
| ``` | |
| --- | |
| ## Configuration | |
| ### Recommended Defaults | |
| | Parameter | Default | Description | | |
| |-----------|---------|-------------| | |
| | `threshold` | 0.5 | Base stability threshold | | |
| | `threshold_adaptation` | 0.1 | Adaptation rate | | |
| | `soft_protection` | True | Soft vs hard protection | | |
| | `protected_threshold` | 0.7 | Score for protected state | | |
| | `volatile_threshold` | 0.3 | Score for volatile state | | |
| | `history_length` | 100 | Steps to track | | |
| ### Tuning Guidelines | |
| **More Protection:** Increase `threshold`, decrease `threshold_adaptation` | |
| **Less Protection:** Decrease `threshold`, increase `threshold_adaptation` | |
| **Faster Adaptation:** Increase `history_length` | |
| **More Stability:** Increase `protected_threshold` | |
| --- | |
| ## Performance | |
| SAL adds approximately 10% computational overhead: | |
| - Stability analysis: O(n) where n = number of parameters | |
| - Protection application: O(n) | |
| - Memory: O(n Γ history_length) for tracking | |
| This overhead is negligible compared to the benefits of reduced catastrophic forgetting and improved continual learning. | |
| --- | |
| *For the philosophy behind these technical choices, see [Principles](principles.md).* | |