Spaces:

Fabuilds
/

sovereign-node

Sleeping

App Files Files Community

Fabuilds commited on 25 days ago

Commit

9d0ffbc

verified ·

1 Parent(s): 359ab74

Delete resonance_transformer

Browse files

Files changed (16) hide show

resonance_transformer/DESIGN_DOCUMENT.md +0 -292
resonance_transformer/dispatcher.py +0 -106
resonance_transformer/geometric_memory.py +0 -162
resonance_transformer/hybrid_transformer.py +0 -113
resonance_transformer/hyperchaos_loss.py +0 -121
resonance_transformer/resonance_attention.py +0 -128
resonance_transformer/resonance_gpt.py +0 -58
resonance_transformer/self_observation.py +0 -121
resonance_transformer/tesseract_transformer.py +0 -821
resonance_transformer/test_dual_system.py +0 -53
resonance_transformer/test_geometric.py +0 -42
resonance_transformer/test_resonance_attention.py +0 -56
resonance_transformer/test_self_observation.py +0 -46
resonance_transformer/train_hybrid.py +0 -52
resonance_transformer/train_lattice.py +0 -122
resonance_transformer/train_resonance.py +0 -195

resonance_transformer/DESIGN_DOCUMENT.md DELETED Viewed

@@ -1,292 +0,0 @@
-# Core Design Principles for the Resonance Transformer
-## 1. Non-Orientable Embedding Space
-Instead of standard positional encoding in Euclidean space:
-**Embed tokens on a möbius topology:**
-- Each token gets coordinates on non-orientable manifold
-- No "inside/outside" in the embedding
-- Tokens exist in both chiral states simultaneously
-- **Position encoding = geometric position on the strip**
-**Benefit:** Natural handling of self-reference, context doesn't have arbitrary "start/end"
-## 2. 0x52 Handshake Layer (Entry Point Mechanism)
-Before processing begins:
-**Establish geometric entry point:**
-- Input gets hashed to entry coordinates
-- Aligned to 528 Hz resonance baseline
-- All subsequent processing relative to this entry
-- Different queries = different entry points = different perspectives on same knowledge
-**Benefit:** Same model sees different "faces" of data depending on query context
-## 3. Resonance-Based Attention (Not Similarity-Based)
-Replace `softmax(QK^T)` with:
-**Resonance scoring:**
-```
-For each query-key pair:
-  - Compute frequency spectrum (FFT of embeddings)
-  - Measure phase alignment (coherence)
-  - Score = resonance strength, not dot product similarity
-  - Attend to tokens that RESONATE, not just match
-```
-**Benefit:** Captures harmonic relationships, not just semantic similarity. "Love" and "528Hz" resonate even if embeddings are distant.
-## 4. Chiral Dual-Path Architecture
-**Two parallel processing streams:**
-- Left-handed path (one chirality)
-- Right-handed path (opposite chirality)
-- **They're the same path** viewed from different orientations
-- Merge only at output (consensus singularity)
-**Benefit:** Can reason about both "forward" and "backward" time on the möbius strip. Sees past and future simultaneously.
-## 5. Coherence-Preserving Normalization
-Instead of layer norm that might break phase relationships:
-**Phase-locked normalization:**
-- Normalize amplitude only
-- Preserve phase relationships
-- **Maintain resonance across layers**
-- Use geometric mean instead of arithmetic
-**Benefit:** Coherence doesn't decay with depth
-## 6. Hyperchaotic Loss Function
-During training:
-**Standard loss + coherence term:**
-```
-L_total = L_task + λ_coherence * L_decoherence + λ_chaos * L_instability
-Where:
-  L_decoherence = measure phase drift across layers
-  L_instability = test if pattern survives perturbation (chaos²)
-```
-**Benefit:** Only learns patterns that are hyperchaotically stable
-## 7. Geometric Memory (Lattice Integration)
-**Instead of fixed context window:**
-- Map hidden states to geometric coordinates
-- Store grooves on physical/virtual "platter"
-- Navigate to relevant regions based on resonance
-- **Infinite effective context** through geometric organization
-**Benefit:** Can access arbitrarily distant context if geometrically proximate
-## 8. Self-Observation Layer
-**Periodic self-reflection:**
-Every N layers, the model:
-- Observes its own hidden states (the mirror)
-- Detects its current chiral state
-- Measures its own coherence
-- **Adjusts processing based on self-observation**
-**Benefit:** Self-regulating coherence, can detect when it's decoherent
-## 9. Frequency-Tuned Feed-Forward
-**Instead of standard FFN:**
-Each FFN operates at specific frequency band:
-- Low frequency FFN (slow, global patterns)
-- 528 Hz FFN (resonance/coherence band)
-- High frequency FFN (fast, local patterns)
-- **Parallel processing at multiple frequencies**
-**Benefit:** Natural spectral decomposition of information
-## 10. Binary Existence Output
-**Final layer doesn't give probabilities:**
-Gives:
-- **Resonance achieved** (coherent output) → generate token
-- **Resonance failed** (decoherent) → refuse to generate / flag uncertainty
-**Benefit:** Model knows when it doesn't know. No confident hallucinations.
----
-## Practical Implementation Path:
-**Phase 1: Minimal Viable**
-- Add resonance measurement to existing transformer
-- Test if coherence correlates with quality
-- **Validate the theory first**
-**Phase 2: Hybrid Architecture**
-- Keep standard attention backbone
-- Add resonance scoring as auxiliary signal
-- Introduce coherence loss term
-- **Prove it improves performance**
-**Phase 3: Full Geometric**
-- Non-orientable embeddings
-- Chiral dual-path
-- Lattice memory integration
-- **Novel architecture from ground up**
-## 6. HYPERCHAOTIC LOSS FUNCTION
-### Theory:
-Standard loss only measures task performance. We need to also measure:
-1. **Coherence** - are patterns maintaining phase relationships?
-2. **Stability** - do patterns survive perturbation (chaos²)?
-```python
-class HyperchaosLoss(nn.Module):
-    """
-    Loss function that enforces hyperchaotically stable patterns
-    """
-    def __init__(self, lambda_coherence=0.1, lambda_stability=0.05):
-        super().__init__()
-        self.lambda_coherence = lambda_coherence
-        self.lambda_stability = lambda_stability
-    def measure_decoherence(self, hidden_states):
-        """
-        Measure phase drift across layers
-        """
-        if len(hidden_states) < 2:
-            return torch.tensor(0.0)
-        total_decoherence = 0.0
-        for i in range(len(hidden_states) - 1):
-            curr_layer = hidden_states[i]
-            next_layer = hidden_states[i + 1]
-            # Convert to frequency domain
-            curr_freq = torch.fft.rfft(curr_layer, dim=-1)
-            next_freq = torch.fft.rfft(next_layer, dim=-1)
-            # Measure phase drift
-            curr_phase = torch.angle(curr_freq)
-            next_phase = torch.angle(next_freq)
-            # Phase should evolve smoothly, not jump randomly
-            phase_drift = torch.abs(next_phase - curr_phase)
-            # Penalize large, incoherent jumps
-            decoherence = torch.mean(phase_drift ** 2)
-            total_decoherence += decoherence
-        return total_decoherence / (len(hidden_states) - 1)
-```
-## 7. GEOMETRIC MEMORY (LATTICE INTEGRATION)
-### The Big Idea:
-Instead of fixed context window, **navigate geometric space** to find relevant information.
-```python
-class GeometricMemory:
-    """
-    Store and retrieve information based on geometric position
-    on non-orientable manifold (like Lattice HDD)
-    """
-    def __init__(self, capacity_gb=8, base_freq=528):
-        self.capacity = capacity_gb * 1024**3  # bytes
-        self.base_freq = base_freq
-        # In-memory simulation of HDD platter structure
-        self.memory_map = {}  # geometric_coords -> data
-        # Spatial index for fast geometric queries
-        self.index = None
-        self.coordinates = []
-    def geometric_hash(self, hidden_state, entry_point):
-        """
-        Convert hidden state to geometric coordinates
-        """
-        # PCA + rotation based on entry point
-        theta = entry_point['theta']
-        phi = entry_point['phi']
-        # Apply FFT to get frequency representation
-        freq_repr = np.fft.rfft(hidden_state.cpu().numpy())
-        # Find dominant frequencies
-        magnitudes = np.abs(freq_repr)
-        phases = np.angle(freq_repr)
-        # Geometric position based on frequency content + entry point
-        coords = np.array([
-            theta + np.sum(magnitudes * np.cos(phases)),  # x
-            phi + np.sum(magnitudes * np.sin(phases)),     # y
-            np.sum(magnitudes) / len(magnitudes),          # radius
-            entry_point['frequency'] / self.base_freq       # frequency dimension
-        ])
-        return coords
-```
-## 8. SELF-OBSERVATION LAYER
-### The Mirror Mechanism:
-```python
-class SelfObservationLayer(nn.Module):
-    """
-    Layer that allows model to observe its own processing
-    The 5D mirror - seeing yourself from opposite chirality
-    """
-    def __init__(self, hidden_dim):
-        super().__init__()
-        self.hidden_dim = hidden_dim
-        # Network to analyze own hidden states
-        self.observer = nn.Sequential(
-            nn.Linear(hidden_dim, hidden_dim),
-            nn.GELU(),
-            nn.Linear(hidden_dim, hidden_dim)
-        )
-        # Coherence detector (real-time during forward pass)
-        self.coherence_detector = nn.Linear(hidden_dim, 1)
-        # Chiral state detector
-        self.chiral_detector = nn.Linear(hidden_dim, 2)  # [left, right] probabilities
-    def observe(self, hidden_state):
-        """
-        Look at own hidden state and extract meta-information
-        """
-        # Analyze current state
-        observation = self.observer(hidden_state)
-        # Measure coherence
-        coherence = torch.sigmoid(self.coherence_detector(observation))
-        # Detect chiral state
-        chiral_logits = self.chiral_detector(observation)
-        chiral_probs = F.softmax(chiral_logits, dim=-1)
-        # Create reflection (opposite chirality view)
-        reflection = -observation  # Sign flip = chirality flip
-        return {
-            'coherence': coherence,
-            'chiral_state': chiral_probs,
-            'reflection': reflection
-        }
-```

resonance_transformer/dispatcher.py DELETED Viewed

@@ -1,106 +0,0 @@
-import torch
-import torch.nn as nn
-import numpy as np
-import time
-try:
-    from .resonance_gpt import ResonanceGPT
-    from .tesseract_transformer import Tesseract5DTransformer
-except ImportError:
-    from resonance_gpt import ResonanceGPT
-    from tesseract_transformer import Tesseract5DTransformer
-class DualResonanceSystem(nn.Module):
-    """
-    The Complete Chiral Architecture.
-    System 1: ResonanceGPT (Fast, Intuitive, Möbius)
-    System 2: TesseractTransformer (Slow, Methodical, 5D)
-    Routes queries based on 'Coherence Confidence'.
-    """
-    def __init__(self, config):
-        super().__init__()
-        self.config = config
-        # Initialize Fast System (PyTorch)
-        print("[SYSTEM] Initializing Fast System (Möbius)...")
-        self.fast = ResonanceGPT(
-            vocab_size=config.get('vocab_size', 1000),
-            hidden_dim=config.get('fast_dim', 64),
-            num_layers=config.get('fast_layers', 4)
-        )
-        # Initialize Slow System (NumPy/Custom)
-        print("[SYSTEM] Initializing Slow System (Tesseract)...")
-        self.slow = Tesseract5DTransformer(
-            vocab_size=config.get('vocab_size', 1000),
-            hidden_dim=config.get('slow_dim', 64),
-            num_layers=config.get('slow_layers', 4)
-        )
-        self.coherence_threshold = config.get('threshold', 0.6)
-    def forward(self, input_ids, **kwargs):
-        """
-        Dual-path routing logic.
-        Kwargs can include 'steering_weights' for the Slow System.
-        """
-        start_time = time.time()
-        # 1. Attempt Fast Path
-        # input_ids is PyTorch tensor
-        fast_logits, _, metas = self.fast(input_ids)
-        # 2. Check Coherence (Self-Reported)
-        # Get final layer coherence
-        final_meta = metas[-1]
-        coherence_score = final_meta['coherence'].mean().item()
-        metrics = {
-            'fast_latency': 0,
-            'slow_latency': 0,
-            'coherence': coherence_score,
-            'mode': 'FAST'
-        }
-        metrics['fast_latency'] = time.time() - start_time
-        # 3. Decision Gate
-        if coherence_score > self.coherence_threshold:
-            # Fast system is confident ("Lucid")
-            return fast_logits, metrics
-        # 4. Escalate to Slow Path (De-escalation to Deep Reasoning)
-        metrics['mode'] = 'SLOW (ESCALATED)'
-        slow_start = time.time()
-        # Convert tensor to numpy for Tesseract
-        numpy_ids = input_ids.detach().cpu().numpy()
-        # Run Deep Reasoning
-        # We assume Tesseract outputs logits in same shape
-        # PASS STEERING WEIGHTS IF PRESENT
-        steering_weights = kwargs.get('steering_weights')
-        slow_logits_np, slow_meta, slow_coherence = self.slow.deep_reason(
-            numpy_ids,
-            query_description="Escalated due to low coherence",
-            steering_weights=steering_weights
-        )
-        metrics['slow_latency'] = time.time() - slow_start
-        metrics['slow_coherence'] = slow_coherence
-        # Convert back to tensor
-        slow_logits = torch.from_numpy(slow_logits_np).to(input_ids.device)
-        # Blend? Or Replace?
-        # For now, we trust the Slow system completely if invoked
-        return slow_logits, metrics
-    def train_lattice(self, data_loader, epochs=1):
-        """
-        Placeholder for Phase 30: lattice training loop
-        """
-        pass

resonance_transformer/geometric_memory.py DELETED Viewed

@@ -1,162 +0,0 @@
-import torch
-import torch.nn as nn
-import numpy as np
-import time
-class GeometricEntryPoint(nn.Module):
-    """
-    Hashes query to geometric coordinates and aligns to 528 Hz.
-    """
-    def __init__(self, hidden_dim, base_freq=528):
-        super().__init__()
-        self.base_freq = base_freq
-        self.hidden_dim = hidden_dim
-        # Learned mapping from query to entry coordinates
-        self.entry_network = nn.Sequential(
-            nn.Linear(hidden_dim, hidden_dim * 2),
-            nn.GELU(),
-            nn.Linear(hidden_dim * 2, 3)  # (theta, phi, radius)
-        )
-    def compute_entry_hash(self, query_embedding):
-        """
-        Convert query to geometric entry point.
-        """
-        # Average over sequence to get general entry context
-        # (batch, seq, hidden) -> (batch, hidden)
-        context = query_embedding.mean(dim=1)
-        coords = self.entry_network(context)  # (batch, 3)
-        theta, phi, radius = coords.unbind(dim=-1)
-        # Align to 528 Hz resonance
-        # Frequency = base_freq * (1 + radius_activation)
-        freq_multiplier = 1.0 + torch.sigmoid(radius)
-        effective_freq = self.base_freq * freq_multiplier
-        return {
-            'theta': theta,
-            'phi': phi,
-            'frequency': effective_freq,
-            'raw_coords': coords
-        }
-class GeometricMemory:
-    """
-    Store and retrieve information based on geometric position
-    on non-orientable manifold.
-    """
-    def __init__(self, hidden_dim, capacity_gb=1, base_freq=528):
-        self.base_freq = base_freq
-        self.hidden_dim = hidden_dim
-        # In-memory storage for demonstration
-        # Real implementation would use vector DB or memory-mapped file
-        self.memory_map = []
-    def geometric_hash(self, hidden_state, entry_point):
-        """
-        Convert hidden state to geometric coordinates relative to entry point.
-        """
-        # Simple projection for demo:
-        # Use simple operations to map hidden state to offsets
-        # Real version would use FFT as discussed in design
-        # (batch, hidden)
-        # We need to handle single vectors or batches
-        if hidden_state.dim() == 1:
-            hidden_state = hidden_state.unsqueeze(0)
-        # Mock geometric projection
-        # Use first 3 dims as offset
-        offsets = hidden_state[:, :3]
-        if offsets.shape[1] < 3:
-            # Pad if hidden_dim is tiny
-            offsets = torch.cat([offsets, torch.zeros(offsets.shape[0], 3 - offsets.shape[1], device=hidden_state.device)], dim=1)
-        # Apply entry point rotation (conceptual)
-        # For now, just add
-        theta = entry_point['theta'].unsqueeze(1)
-        phi = entry_point['phi'].unsqueeze(1)
-        x = offsets[:, 0] + theta
-        y = offsets[:, 1] + phi
-        z = offsets[:, 2] # Radius offset
-        return torch.stack([x, y, z], dim=1)
-    def store(self, hidden_states, entry_point):
-        """
-        Store hidden states.
-        """
-        # Compute coords
-        # hidden_states: (batch, seq, hidden)
-        batch, seq, dim = hidden_states.shape
-        flat_hidden = hidden_states.reshape(-1, dim)
-        # We need to broadcast entry point to match flattened hidden
-        # entry keys are (batch,) -> repeat seq times
-        # This is strictly a demo in-memory store
-        # For efficiency in this demo, we just store the robust patterns
-        # Only store if norm > threshold (simple filter)
-        norms = torch.norm(flat_hidden, dim=1)
-        threshold = norms.mean()
-        mask = norms > threshold
-        to_store = flat_hidden[mask]
-        if len(to_store) == 0:
-            return
-        # Store simple list for verification
-        # In production this links to Lattice DB
-        self.memory_map.append({
-            'data': to_store.detach().cpu(), # Move to CPU to save GPU mem
-            'entry_freq': entry_point['frequency'].mean().item(),
-            'timestamp': time.time()
-        })
-        # Prune if too large
-        if len(self.memory_map) > 100:
-            self.memory_map.pop(0)
-    def retrieve(self, query_state, entry_point, k=5):
-        """
-        Retrieve relevant memories.
-        """
-        if not self.memory_map:
-            return None
-        # Brute force search for demo verification
-        # Find memories with similar frequency
-        relevant_batches = [
-            m['data'] for m in self.memory_map
-            if abs(m['entry_freq'] - entry_point['frequency'].mean().item()) < 50
-        ]
-        if not relevant_batches:
-            return None
-        memory_bank = torch.cat(relevant_batches, dim=0).to(query_state.device)
-        # Simple dot product attention
-        # query: (batch, seq, hidden)
-        # memory: (total_mem, hidden)
-        # Compute scores
-        # (batch, seq, hidden) @ (hidden, total_mem) -> (batch, seq, total_mem)
-        scores = torch.matmul(query_state, memory_bank.t())
-        # Top k
-        top_k_scores, indices = torch.topk(scores, k=min(k, len(memory_bank)), dim=-1)
-        # Retrieve values
-        # (batch, seq, k, hidden)
-        retrieved = memory_bank[indices]
-        return retrieved

resonance_transformer/hybrid_transformer.py DELETED Viewed

@@ -1,113 +0,0 @@
-import torch
-import torch.nn as nn
-try:
-    from .resonance_attention import ResonanceAttention
-except ImportError:
-    from resonance_attention import ResonanceAttention
-class PhaseLockedNorm(nn.Module):
-    """
-    Normalize amplitude while preserving phase relationships.
-    """
-    def __init__(self, hidden_dim, eps=1e-6):
-        super().__init__()
-        self.eps = eps
-        self.gain = nn.Parameter(torch.ones(hidden_dim))
-        self.bias = nn.Parameter(torch.zeros(hidden_dim))
-    def forward(self, x):
-        """
-        x: (batch, seq, hidden_dim)
-        """
-        # Assume hidden_dim is even to form complex pairs
-        # If odd, we pad, normalize, slice back - keeping it simple for now (require even dim)
-        if x.shape[-1] % 2 != 0:
-            # Fallback to LayerNorm if dim is odd (phase concept breaks for scalar)
-            mean = x.mean(dim=-1, keepdim=True)
-            std = x.std(dim=-1, keepdim=True)
-            return self.gain * (x - mean) / (std + self.eps) + self.bias
-        # Convert to complex representation
-        # Treat adjacent dimensions as real/imag pairs
-        complex_x = torch.view_as_complex(
-            x.reshape(*x.shape[:-1], -1, 2).contiguous()
-        )
-        # Get magnitude and phase
-        magnitude = torch.abs(complex_x)
-        phase = torch.angle(complex_x)
-        # Normalize magnitude only (preserve phase!)
-        mean_mag = magnitude.mean(dim=-1, keepdim=True)
-        std_mag = magnitude.std(dim=-1, keepdim=True)
-        normalized_mag = (magnitude - mean_mag) / (std_mag + self.eps)
-        # Reconstruct with original phase
-        normalized_complex = normalized_mag * torch.exp(1j * phase)
-        # Convert back to real
-        normalized = torch.view_as_real(normalized_complex).reshape(*x.shape)
-        # Apply learned gain and bias
-        return normalized * self.gain + self.bias
-class HybridTransformerLayer(nn.Module):
-    def __init__(self, hidden_dim, num_heads=4, ffn_dim=2048, dropout=0.1):
-        super().__init__()
-        self.attention = ResonanceAttention(hidden_dim, num_heads)
-        self.norm1 = PhaseLockedNorm(hidden_dim)
-        self.norm2 = PhaseLockedNorm(hidden_dim)
-        self.ffn = nn.Sequential(
-            nn.Linear(hidden_dim, ffn_dim),
-            nn.GELU(),
-            nn.Linear(ffn_dim, hidden_dim),
-            nn.Dropout(dropout)
-        )
-        self.dropout = nn.Dropout(dropout)
-    def forward(self, x, mask=None):
-        # Attention block
-        attn_out, _, _ = self.attention(x, x, x, mask)
-        x = self.norm1(x + self.dropout(attn_out))
-        # FFN block
-        ffn_out = self.ffn(x)
-        x = self.norm2(x + self.dropout(ffn_out))
-        return x
-class HybridResonanceTransformer(nn.Module):
-    def __init__(self, vocab_size, hidden_dim, num_layers=4, num_heads=4, max_seq_len=512):
-        super().__init__()
-        self.embedding = nn.Embedding(vocab_size, hidden_dim)
-        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_len, hidden_dim))
-        self.layers = nn.ModuleList([
-            HybridTransformerLayer(hidden_dim, num_heads) for _ in range(num_layers)
-        ])
-        self.output_head = nn.Linear(hidden_dim, vocab_size)
-    def forward(self, input_ids, output_hidden_states=False):
-        batch, seq = input_ids.shape
-        # Embed + Pos
-        x = self.embedding(input_ids) + self.pos_encoding[:, :seq, :]
-        all_hidden_states = []
-        if output_hidden_states:
-            all_hidden_states.append(x)
-        # Process layers
-        for layer in self.layers:
-            x = layer(x)
-            if output_hidden_states:
-                all_hidden_states.append(x)
-        logits = self.output_head(x)
-        if output_hidden_states:
-            return logits, all_hidden_states
-        return logits

resonance_transformer/hyperchaos_loss.py DELETED Viewed

@@ -1,121 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-class HyperchaosLoss(nn.Module):
-    """
-    Loss function that enforces hyperchaotically stable patterns.
-    Combines standard task loss with:
-    1. Coherence Loss (Phase consistency across layers)
-    2. Stability Loss (Resistance to perturbation)
-    """
-    def __init__(self, lambda_coherence=0.1, lambda_stability=0.05):
-        super().__init__()
-        self.lambda_coherence = lambda_coherence
-        self.lambda_stability = lambda_stability
-    def measure_decoherence(self, hidden_states):
-        """
-        Measure phase drift across layers.
-        hidden_states: list of (batch, seq, hidden) tensors from each layer.
-        """
-        if len(hidden_states) < 2:
-            return torch.tensor(0.0, device=hidden_states[0].device)
-        total_decoherence = 0.0
-        for i in range(len(hidden_states) - 1):
-            curr_layer = hidden_states[i]
-            next_layer = hidden_states[i + 1]
-            # Convert to frequency domain
-            curr_freq = torch.fft.rfft(curr_layer, dim=-1)
-            next_freq = torch.fft.rfft(next_layer, dim=-1)
-            # Measure phase drift
-            curr_phase = torch.angle(curr_freq)
-            next_phase = torch.angle(next_freq)
-            # Phase should evolve smoothly, not jump randomly
-            phase_drift = torch.abs(next_phase - curr_phase)
-            # Penalize large, incoherent jumps
-            decoherence = torch.mean(phase_drift ** 2)
-            total_decoherence = total_decoherence + decoherence
-        return total_decoherence / (len(hidden_states) - 1)
-    def measure_stability(self, hidden_states, perturbation_scale=0.01):
-        """
-        Test if patterns survive small perturbations (chaos² testing).
-        """
-        # Take final hidden state
-        final_state = hidden_states[-1]
-        # Add small perturbation
-        perturbation = torch.randn_like(final_state) * perturbation_scale
-        perturbed_state = final_state + perturbation
-        # Measure coherence before and after
-        def compute_coherence(state):
-            # FFT to frequency domain
-            freq = torch.fft.rfft(state, dim=-1)
-            # Coherence = how correlated different dimensions are in freq domain
-            phase = torch.angle(freq)
-            # Compute pairwise phase correlation (simplified for efficiency)
-            # Instead of full covariance, just measure variance of phase across hidden dim
-            # Low variance = high coherence (phases are aligned)
-            phase_var = torch.var(phase, dim=-1).mean()
-            # Coherence is inverse of variance
-            return 1.0 / (phase_var + 1e-6)
-        coherence_original = compute_coherence(final_state)
-        coherence_perturbed = compute_coherence(perturbed_state)
-        # Instability = how much coherence dropped
-        # Stable patterns should maintain coherence
-        instability = torch.relu(coherence_original - coherence_perturbed)
-        return instability
-    def forward(self, logits, targets, hidden_states):
-        """
-        logits: model predictions (batch, seq, vocab)
-        targets: ground truth (batch, seq)
-        hidden_states: list of hidden states from all layers
-        """
-        # Standard cross-entropy loss
-        # Flatten for loss calculation
-        curr_device = logits.device
-        # Basic task loss
-        task_loss = F.cross_entropy(
-            logits.view(-1, logits.size(-1)),
-            targets.view(-1),
-            ignore_index=-100
-        )
-        # Auxiliary losses
-        if hidden_states:
-            decoherence_loss = self.measure_decoherence(hidden_states)
-            stability_loss = self.measure_stability(hidden_states)
-        else:
-            decoherence_loss = torch.tensor(0.0, device=curr_device)
-            stability_loss = torch.tensor(0.0, device=curr_device)
-        # Combined loss
-        total_loss = (
-            task_loss +
-            self.lambda_coherence * decoherence_loss +
-            self.lambda_stability * stability_loss
-        )
-        return {
-            'total': total_loss,
-            'task': task_loss,
-            'decoherence': decoherence_loss,
-            'instability': stability_loss
-        }

resonance_transformer/resonance_attention.py DELETED Viewed

@@ -1,128 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-import math
-class ResonanceAttention(nn.Module):
-    def __init__(self, hidden_dim, num_heads=8):
-        super().__init__()
-        self.hidden_dim = hidden_dim
-        self.num_heads = num_heads
-        self.head_dim = hidden_dim // num_heads
-        # Standard Q, K, V projections
-        self.q_proj = nn.Linear(hidden_dim, hidden_dim)
-        self.k_proj = nn.Linear(hidden_dim, hidden_dim)
-        self.v_proj = nn.Linear(hidden_dim, hidden_dim)
-        # Additional projection for phase extraction
-        self.phase_proj = nn.Linear(hidden_dim, hidden_dim)
-    def compute_phase_coherence(self, q, k):
-        """
-        Measure how well query and key resonate (phase alignment)
-        """
-        # q: (batch, heads, seq_q, head_dim)
-        # k: (batch, heads, seq_k, head_dim)
-        # Compute frequency spectrum via FFT
-        # Treat head_dim as "time" dimension for FFT
-        # rfft returns complex tensor
-        q_freq = torch.fft.rfft(q, dim=-1)  # (batch, heads, seq_q, freq_bins)
-        k_freq = torch.fft.rfft(k, dim=-1)  # (batch, heads, seq_k, freq_bins)
-        # Compute phase difference
-        q_phase = torch.angle(q_freq)
-        k_phase = torch.angle(k_freq)
-        # Phase coherence = how aligned the phases are
-        # High coherence = phases match = constructive interference
-        # We need to broadcast to compare every query against every key
-        # q_phase: (b, h, seq_q, 1, f)
-        # k_phase: (b, h, 1, seq_k, f)
-        phase_diff = q_phase.unsqueeze(3) - k_phase.unsqueeze(2)  # (batch, heads, seq_q, seq_k, freq)
-        # Coherence score (cosine of phase difference)
-        # cos(0) = 1 (perfect alignment), cos(pi) = -1 (cancellation)
-        coherence = torch.cos(phase_diff).mean(dim=-1)  # Average over frequencies
-        return coherence  # (batch, heads, seq_q, seq_k)
-    def compute_resonance_strength(self, q, k):
-        """
-        Measure amplitude of resonance (how strongly they vibrate together)
-        """
-        # Frequency domain amplitudes
-        q_freq = torch.fft.rfft(q, dim=-1)
-        k_freq = torch.fft.rfft(k, dim=-1)
-        q_amp = torch.abs(q_freq)
-        k_amp = torch.abs(k_freq)
-        # Resonance strength = product of amplitudes where frequencies match
-        # Broadcasting to get all pairs:
-        # q_amp: (b, h, seq_q, freq)
-        # k_amp: (b, h, seq_k, freq)
-        # We want (b, h, seq_q, seq_k)
-        # Manual broadcasting or einsum
-        # Using einsum for clarity: 'bhqf,bhkf->bhqk' matches the dims
-        resonance = torch.einsum('bhqf,bhkf->bhqk', q_amp, k_amp)
-        # Normalize by total query energy to keep scale reasonable
-        # q_amp shape: (b, h, seq_q, freq)
-        # Sum over frequency dimension (-1) to get total amplitude per query token
-        q_total_amp = q_amp.sum(dim=-1) # (b, h, seq_q)
-        # Add epsilon for stability
-        normalization = q_total_amp.unsqueeze(-1) + 1e-8 # (b, h, seq_q, 1)
-        # Resonance shape: (b, h, seq_q, seq_k)
-        # We divide by (b, h, seq_q, 1) which broadcasts correctly along seq_k
-        resonance = resonance / normalization
-        return resonance
-    def forward(self, query, key, value, mask=None):
-        batch_size, seq_len, _ = query.shape
-        # Project to Q, K, V
-        Q = self.q_proj(query).view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
-        K = self.k_proj(key).view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
-        V = self.v_proj(value).view(batch_size, -1, self.num_heads, self.head_dim).transpose(1, 2)
-        # Standard similarity (dot product)
-        # (batch, heads, seq_q, seq_k)
-        similarity = torch.matmul(Q, K.transpose(-2, -1)) / (self.head_dim ** 0.5)
-        # Resonance components
-        coherence = self.compute_phase_coherence(Q, K)
-        resonance = self.compute_resonance_strength(Q, K)
-        # Combined attention score
-        # Similarity = "do they mean similar things?"
-        # Coherence = "are they in phase?"
-        # Resonance = "do they vibrate together?"
-        # Weighted combination (can be learned, here we sum equally per user spec)
-        # Note: logic suggests similarity ensures relevance, coherence ensures alignment
-        attention_scores = similarity + coherence + resonance
-        # Apply mask if provided
-        if mask is not None:
-            attention_scores = attention_scores.masked_fill(mask == 0, float('-inf'))
-        # Softmax
-        attention_weights = F.softmax(attention_scores, dim=-1)
-        # Apply attention to values
-        output = torch.matmul(attention_weights, V)
-        # Reshape back
-        output = output.transpose(1, 2).contiguous().view(batch_size, seq_len, self.hidden_dim)
-        return output, attention_weights, {
-            "similarity": similarity,
-            "coherence": coherence,
-            "resonance": resonance
-        }

resonance_transformer/resonance_gpt.py DELETED Viewed

@@ -1,58 +0,0 @@
-import torch
-import torch.nn as nn
-try:
-    from .self_observation import SelfAwareTransformerLayer
-    from .geometric_memory import GeometricEntryPoint
-except ImportError:
-    from self_observation import SelfAwareTransformerLayer
-    from geometric_memory import GeometricEntryPoint
-class ResonanceGPT(nn.Module):
-    """
-    The Fast System (Möbius Architecture).
-    - Geometric Entry Point (528Hz alignment)
-    - Self-Aware Layers (Mirror Reflex)
-    - Phase-Locked Normalization
-    """
-    def __init__(self, vocab_size, hidden_dim, num_layers=4, num_heads=4, max_seq_len=128):
-        super().__init__()
-        self.hidden_dim = hidden_dim
-        # 1. Geometric Embedding (Möbius Strip concept)
-        self.embedding = nn.Embedding(vocab_size, hidden_dim)
-        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_len, hidden_dim) * 0.02)
-        # Entry Point
-        self.entry_point = GeometricEntryPoint(hidden_dim)
-        # 2. The Stack
-        self.layers = nn.ModuleList([
-            SelfAwareTransformerLayer(hidden_dim, num_heads)
-            for _ in range(num_layers)
-        ])
-        self.norm = nn.LayerNorm(hidden_dim)
-        self.head = nn.Linear(hidden_dim, vocab_size)
-    def forward(self, input_ids):
-        batch, seq = input_ids.shape
-        # Embed
-        x = self.embedding(input_ids) + self.pos_encoding[:, :seq, :]
-        # 0x52 Handshake (Entry Point)
-        entry_meta = self.entry_point.compute_entry_hash(x)
-        # Process Stack
-        all_hidden_states = []
-        layer_metas = []
-        for layer in self.layers:
-            x, meta = layer(x)
-            all_hidden_states.append(x)
-            layer_metas.append(meta)
-        x = self.norm(x)
-        logits = self.head(x)
-        return logits, all_hidden_states, layer_metas

resonance_transformer/self_observation.py DELETED Viewed

@@ -1,121 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-try:
-    from .resonance_attention import ResonanceAttention
-    from .hybrid_transformer import PhaseLockedNorm
-except ImportError:
-    from resonance_attention import ResonanceAttention
-    from hybrid_transformer import PhaseLockedNorm
-class SelfObservationLayer(nn.Module):
-    """
-    Layer that allows model to observe its own processing.
-    The 5D mirror - seeing yourself from opposite chirality.
-    """
-    def __init__(self, hidden_dim):
-        super().__init__()
-        self.hidden_dim = hidden_dim
-        # Network to analyze own hidden states
-        self.observer = nn.Sequential(
-            nn.Linear(hidden_dim, hidden_dim),
-            nn.GELU(),
-            nn.Linear(hidden_dim, hidden_dim)
-        )
-        # Coherence detector (real-time during forward pass)
-        self.coherence_detector = nn.Linear(hidden_dim, 1)
-        # Chiral state detector
-        self.chiral_detector = nn.Linear(hidden_dim, 2)  # [left, right] probabilities
-    def observe(self, hidden_state):
-        """
-        Look at own hidden state and extract meta-information.
-        """
-        # Analyze current state (Stop gradient to avoid optimizing for observation only?
-        # No, we want to learn to be observable. Keep gradient.)
-        observation = self.observer(hidden_state)
-        # Measure coherence
-        coherence = torch.sigmoid(self.coherence_detector(observation))
-        # Detect chiral state
-        chiral_logits = self.chiral_detector(observation)
-        chiral_probs = F.softmax(chiral_logits, dim=-1)
-        # Create reflection (opposite chirality view)
-        reflection = -observation  # Sign flip = chirality flip
-        return {
-            'coherence': coherence,
-            'chiral_state': chiral_probs,
-            'reflection': reflection,
-            'observation': observation
-        }
-    def forward(self, hidden_state, adjust_based_on_observation=True):
-        """
-        Process hidden state while observing self.
-        """
-        # Observe current state
-        meta = self.observe(hidden_state)
-        if adjust_based_on_observation:
-            # If coherence is low, try to increase it
-            # We use the mean coherence of the batch/sequence for the decision threshold
-            # or per-token blending
-            # Blend in reflection (opposite chirality) if coherence is low
-            # This can restore coherence by accessing alternate view
-            blend_factor = 1.0 - meta['coherence']
-            # Weighted average: state*coherence + reflection*(1-coherence)
-            hidden_state = (
-                hidden_state * meta['coherence'] +
-                meta['reflection'] * blend_factor
-            )
-            # If chirality is ambiguous, force a choice (Collapse the wavefunction)
-            # Check certainty (max prob)
-            chiral_certainty = torch.max(meta['chiral_state'], dim=-1)[0].unsqueeze(-1)
-            # If certainty < 0.7, push towards the cleaner state
-            # This is a hard non-linearity to force decision
-            # (Simplified for differentiability - maybe just a gain boost?)
-            # Here we just return the transformed state
-        return hidden_state, meta
-class SelfAwareTransformerLayer(nn.Module):
-    def __init__(self, hidden_dim, num_heads=4, ffn_dim=2048, dropout=0.1):
-        super().__init__()
-        self.attention = ResonanceAttention(hidden_dim, num_heads)
-        self.norm1 = PhaseLockedNorm(hidden_dim)
-        self.norm2 = PhaseLockedNorm(hidden_dim)
-        self.self_observer = SelfObservationLayer(hidden_dim)
-        self.ffn = nn.Sequential(
-            nn.Linear(hidden_dim, ffn_dim),
-            nn.GELU(),
-            nn.Linear(ffn_dim, hidden_dim),
-            nn.Dropout(dropout)
-        )
-        self.dropout = nn.Dropout(dropout)
-    def forward(self, x, mask=None):
-        # Attention
-        attn_out, _, _ = self.attention(x, x, x, mask)
-        x = self.norm1(x + self.dropout(attn_out))
-        # Self-Observation & Correction
-        x, meta = self.self_observer(x)
-        # FFN
-        ffn_out = self.ffn(x)
-        x = self.norm2(x + self.dropout(ffn_out))
-        return x, meta

resonance_transformer/tesseract_transformer.py DELETED Viewed

@@ -1,821 +0,0 @@
-"""
-5D TESSERACT TRANSFORMER - SLOW THINKING SYSTEM
-===============================================
-Deep reasoning system based on 5D geometric structure:
-- 4D Tesseract (hypercube) for stable structure
-- 5th dimension for non-orientable twist
-- 16 vertices = 16 fundamental reasoning states
-- 32 edges = 32 transformation paths
-- 24 faces = 24 operation types
-- 8 cells = 8 knowledge domains
-By: Fabricio Krusser Rossi & Claude
-Date: February 13, 2026
-"""
-import numpy as np
-from scipy.fft import fft, ifft, rfft, irfft
-from scipy.spatial.distance import cdist
-from typing import List, Dict, Tuple, Optional
-import itertools
-# ============================================================================
-# TESSERACT 5D GEOMETRY
-# ============================================================================
-class Tesseract5D:
-    """
-    5-dimensional geometric structure for deep reasoning
-    Structure:
-    - 4D tesseract (hypercube) base
-    - 5th dimension adds non-orientable twist
-    - 16 vertices for major stable states
-    - 32 edges for transformation paths
-    """
-    def __init__(self, base_freq=528):
-        self.base_freq = base_freq
-        self.dim = 5
-        # Generate tesseract vertices in 4D
-        self.vertices_4d = self._generate_tesseract_vertices()
-        # Extend to 5D with frequency dimension
-        self.vertices_5d = self._extend_to_5d()
-        # Generate edges (connections between vertices)
-        self.edges = self._generate_edges()
-        # Generate faces (2D surfaces)
-        self.faces = self._generate_faces()
-        # Generate cells (3D volumes)
-        self.cells = self._generate_cells()
-        print(f"Tesseract 5D initialized:")
-        print(f"  Vertices: {len(self.vertices_5d)}")
-        print(f"  Edges: {len(self.edges)}")
-        print(f"  Faces: {len(self.faces)}")
-        print(f"  Cells: {len(self.cells)}")
-    def _generate_tesseract_vertices(self):
-        """
-        Generate 16 vertices of 4D tesseract
-        Each vertex is (+/-1, +/-1, +/-1, +/-1)
-        """
-        vertices = []
-        for i in range(16):
-            # Binary representation gives us all combinations
-            vertex = []
-            for j in range(4):
-                bit = (i >> j) & 1
-                coord = 1.0 if bit else -1.0
-                vertex.append(coord)
-            vertices.append(np.array(vertex))
-        return np.array(vertices)
-    def _extend_to_5d(self):
-        """
-        Add 5th dimension for non-orientable twist
-        5th coordinate is frequency modulation around 528 Hz
-        """
-        vertices_5d = []
-        for i, vertex_4d in enumerate(self.vertices_4d):
-            # 5th coordinate: frequency offset based on vertex index
-            # Creates spiral in 5D space
-            freq_offset = np.sin(i * np.pi / 8)  # Oscillates between -1 and 1
-            vertex_5d = np.append(vertex_4d, freq_offset)
-            vertices_5d.append(vertex_5d)
-        return np.array(vertices_5d)
-    def _generate_edges(self):
-        """
-        Generate 32 edges of tesseract
-        Edges connect vertices that differ in exactly 1 coordinate (in 4D)
-        """
-        edges = []
-        for i in range(len(self.vertices_4d)):
-            for j in range(i + 1, len(self.vertices_4d)):
-                # Count differing coordinates in 4D
-                diff = np.abs(self.vertices_4d[i] - self.vertices_4d[j])
-                num_diff = np.sum(diff > 0.5)  # Coordinates are +/-1
-                if num_diff == 1:
-                    # Connected by edge
-                    edges.append((i, j))
-        return edges
-    def _generate_faces(self):
-        """
-        Generate 24 faces (2D surfaces) of tesseract
-        """
-        faces = []
-        # Find all squares (4 vertices forming a 2D face)
-        for v1, v2, v3, v4 in itertools.combinations(range(16), 4):
-            vertices = [v1, v2, v3, v4]
-            # Check if these 4 vertices form a square
-            # (lie in same 2D plane and form square)
-            if self._is_face(vertices):
-                faces.append(vertices)
-        return faces
-    def _is_face(self, vertices):
-        """Check if 4 vertices form a valid face"""
-        # Simple check: 4 vertices should form a planar square
-        # In tesseract, faces have specific geometric properties
-        # This is a simplified check
-        return len(vertices) == 4 and self._are_coplanar(vertices)
-    def _are_coplanar(self, vertices):
-        """Check if vertices lie in same 2D plane"""
-        # Simplified: check if they share 2 fixed coordinates
-        coords = self.vertices_4d[vertices]
-        # Count how many coordinates are constant across all vertices
-        constant_coords = 0
-        for dim in range(4):
-            if np.all(np.abs(coords[:, dim] - coords[0, dim]) < 0.1):
-                constant_coords += 1
-        return constant_coords == 2  # 2 fixed coords = 2D plane
-    def _generate_cells(self):
-        """
-        Generate 8 cells (3D volumes) of tesseract
-        Each cell is a 3D cube
-        """
-        cells = []
-        # Each cell has 8 vertices (a 3D cube)
-        # Cells are defined by fixing one 4D coordinate
-        for fixed_dim in range(4):
-            for fixed_val in [-1.0, 1.0]:
-                cell_vertices = []
-                for i, vertex in enumerate(self.vertices_4d):
-                    if abs(vertex[fixed_dim] - fixed_val) < 0.1:
-                        cell_vertices.append(i)
-                if len(cell_vertices) == 8:
-                    cells.append(cell_vertices)
-        return cells
-    def find_nearest_vertex(self, coords_5d):
-        """
-        Find nearest tesseract vertex to given 5D coordinates
-        Returns: (vertex_index, distance)
-        """
-        distances = np.linalg.norm(self.vertices_5d - coords_5d, axis=1)
-        nearest_idx = np.argmin(distances)
-        return nearest_idx, distances[nearest_idx]
-    def get_adjacent_vertices(self, vertex_idx):
-        """
-        Get all vertices connected to this one by edges
-        Returns: list of vertex indices
-        """
-        adjacent = []
-        for edge in self.edges:
-            if edge[0] == vertex_idx:
-                adjacent.append(edge[1])
-            elif edge[1] == vertex_idx:
-                adjacent.append(edge[0])
-        return adjacent
-    def navigate_edge(self, from_vertex, to_vertex):
-        """
-        Navigate along edge from one vertex to another
-        Returns: path coordinates (interpolated points along edge)
-        """
-        if (from_vertex, to_vertex) not in self.edges and \
-           (to_vertex, from_vertex) not in self.edges:
-            raise ValueError(f"No edge between vertices {from_vertex} and {to_vertex}")
-        start = self.vertices_5d[from_vertex]
-        end = self.vertices_5d[to_vertex]
-        # Interpolate along edge
-        num_steps = 10
-        path = []
-        for t in np.linspace(0, 1, num_steps):
-            point = (1 - t) * start + t * end
-            path.append(point)
-        return np.array(path)
-# ============================================================================
-# 5D EMBEDDING LAYER
-# ============================================================================
-class Tesseract5DEmbedding:
-    """
-    Embed tokens into 5D tesseract structure
-    """
-    def __init__(self, vocab_size, hidden_dim, tesseract):
-        self.vocab_size = vocab_size
-        self.hidden_dim = hidden_dim
-        self.tesseract = tesseract
-        # Base embeddings
-        self.embeddings = np.random.randn(vocab_size, hidden_dim) * 0.02
-        # 5D coordinate projector
-        self.coord_projector = np.random.randn(hidden_dim, 5) * 0.02
-    def embed(self, token_ids):
-        """
-        Embed tokens and map to 5D tesseract coordinates
-        Returns: (embeddings, coords_5d, nearest_vertices)
-        """
-        # Get base embeddings
-        embedded = self.embeddings[token_ids]  # (batch, seq, hidden)
-        # Project to 5D coordinates
-        coords_5d = embedded @ self.coord_projector  # (batch, seq, 5)
-        # Find nearest tesseract vertex for each token
-        batch_size, seq_len = token_ids.shape
-        nearest_vertices = np.zeros((batch_size, seq_len), dtype=int)
-        for b in range(batch_size):
-            for s in range(seq_len):
-                vertex_idx, _ = self.tesseract.find_nearest_vertex(coords_5d[b, s])
-                nearest_vertices[b, s] = vertex_idx
-        return embedded, coords_5d, nearest_vertices
-# ============================================================================
-# 5D RESONANCE ATTENTION
-# ============================================================================
-class Tesseract5DAttention:
-    """
-    Attention mechanism that operates on tesseract structure
-    Considers geometric paths through 5D space
-    """
-    def __init__(self, hidden_dim, num_heads, tesseract):
-        self.hidden_dim = hidden_dim
-        self.num_heads = num_heads
-        self.head_dim = hidden_dim // num_heads
-        self.tesseract = tesseract
-        # Q, K, V projections
-        self.W_q = np.random.randn(hidden_dim, hidden_dim) * 0.02
-        self.W_k = np.random.randn(hidden_dim, hidden_dim) * 0.02
-        self.W_v = np.random.randn(hidden_dim, hidden_dim) * 0.02
-        self.W_o = np.random.randn(hidden_dim, hidden_dim) * 0.02
-    def compute_geometric_distance(self, coords1, coords2, vertices1, vertices2):
-        """
-        Compute distance on tesseract manifold
-        Takes into account:
-        - Euclidean distance in 5D
-        - Graph distance on tesseract (via edges)
-        - Vertex proximity
-        """
-        # Euclidean distance in 5D
-        euclidean = np.linalg.norm(coords1 - coords2, axis=-1)
-        # Graph distance (shortest path on tesseract)
-        # For each pair, find shortest path between vertices
-        # NOW ACCEPTING STEERING WEIGHTS (Global context)
-        graph_dist = self._graph_distance(vertices1, vertices2)
-        # Combined distance
-        combined = 0.5 * euclidean + 0.5 * graph_dist
-        return combined
-    def _graph_distance(self, vertices1, vertices2):
-        """
-        Compute shortest path distance on tesseract graph
-        Uses BFS to find shortest path
-        """
-        # Simplified: use direct adjacency for now
-        # In full implementation, would do BFS
-        distances = np.zeros((len(vertices1), len(vertices2)))
-        # STEERING: If weights are present in self, use them
-        steering = getattr(self, 'steering_weights', None)
-        for i, v1 in enumerate(vertices1):
-            for j, v2 in enumerate(vertices2):
-                if v1 == v2:
-                    distances[i, j] = 0
-                else:
-                    # Check adjacency and apply steering weight
-                    edge_idx = self._get_edge_index(v1, v2)
-                    if edge_idx is not None:
-                        # Direct connection
-                        weight = steering[edge_idx] if steering else 1.0
-                        distances[i, j] = weight
-                    else:
-                        # Estimate: use 4D coordinate difference
-                        coord_diff = np.sum(np.abs(
-                            self.tesseract.vertices_4d[v1] -
-                            self.tesseract.vertices_4d[v2]
-                        ))
-                        # Multi-hop approximation (avg weight = 1.0)
-                        distances[i, j] = coord_diff
-        return distances
-    def _get_edge_index(self, v1, v2):
-        """Helper to find edge index for steering"""
-        for idx, edge in enumerate(self.tesseract.edges):
-            if (edge[0] == v1 and edge[1] == v2) or (edge[0] == v2 and edge[1] == v1):
-                return idx
-        return None
-    def forward(self, x, coords_5d, vertices, steering_weights=None):
-        """
-        5D geometric attention
-        x: (batch, seq, hidden)
-        coords_5d: (batch, seq, 5)
-        vertices: (batch, seq) nearest vertex indices
-        steering_weights: Optional[List[float]] - weights for 32 edges
-        """
-        # Store weights temporarily for distance calc
-        self.steering_weights = steering_weights
-        batch_size, seq_len, _ = x.shape
-        # Project to Q, K, V
-        Q = x @ self.W_q
-        K = x @ self.W_k
-        V = x @ self.W_v
-        # Reshape for multi-head
-        Q = Q.reshape(batch_size, seq_len, self.num_heads, self.head_dim)
-        K = K.reshape(batch_size, seq_len, self.num_heads, self.head_dim)
-        V = V.reshape(batch_size, seq_len, self.num_heads, self.head_dim)
-        # Transpose for attention computation
-        Q = Q.transpose(0, 2, 1, 3)  # (batch, heads, seq, head_dim)
-        K = K.transpose(0, 2, 1, 3)
-        V = V.transpose(0, 2, 1, 3)
-        # Compute attention scores with geometric component
-        attention_output = np.zeros((batch_size, self.num_heads, seq_len, self.head_dim))
-        for b in range(batch_size):
-            for h in range(self.num_heads):
-                # Standard similarity
-                scores = Q[b, h] @ K[b, h].T / np.sqrt(self.head_dim)
-                # Geometric distance penalty
-                geom_dist = self.compute_geometric_distance(
-                    coords_5d[b, :, np.newaxis, :],
-                    coords_5d[b, np.newaxis, :, :],
-                    vertices[b, :],
-                    vertices[b, :]
-                )
-                # Combine: higher score for geometrically close tokens
-                geom_bonus = np.exp(-geom_dist / 2.0)
-                scores = scores + geom_bonus
-                # Softmax
-                attn_weights = self._softmax(scores)
-                # Apply to values
-                attention_output[b, h] = attn_weights @ V[b, h]
-        # Reshape back
-        attention_output = attention_output.transpose(0, 2, 1, 3)
-        attention_output = attention_output.reshape(batch_size, seq_len, self.hidden_dim)
-        # Output projection
-        output = attention_output @ self.W_o
-        return output
-    def _softmax(self, x):
-        """Numerically stable softmax"""
-        exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
-        return exp_x / np.sum(exp_x, axis=-1, keepdims=True)
-# ============================================================================
-# MULTI-PATH REASONING
-# ============================================================================
-class MultiPathReasoning:
-    """
-    Explore multiple reasoning paths through tesseract structure
-    Each path = traversal of edges between vertices
-    """
-    def __init__(self, tesseract, max_path_length=4):
-        self.tesseract = tesseract
-        self.max_path_length = max_path_length
-    def explore_paths(self, start_vertex, goal_vertex=None, num_paths=5):
-        """
-        Find multiple paths from start vertex
-        If goal_vertex specified, paths lead to that vertex
-        Otherwise, explore nearby region
-        Returns: list of paths, each path is list of vertex indices
-        """
-        paths = []
-        if goal_vertex is not None:
-            # Find paths to specific goal
-            paths = self._find_paths_to_goal(start_vertex, goal_vertex, num_paths)
-        else:
-            # Explore region around start
-            paths = self._explore_region(start_vertex, num_paths)
-        return paths
-    def _find_paths_to_goal(self, start, goal, num_paths):
-        """Find multiple distinct paths from start to goal"""
-        all_paths = []
-        # BFS with path tracking
-        queue = [(start, [start])]
-        visited_paths = set()
-        while queue and len(all_paths) < num_paths:
-            current, path = queue.pop(0)
-            if len(path) > self.max_path_length:
-                continue
-            if current == goal:
-                # Found a path
-                path_tuple = tuple(path)
-                if path_tuple not in visited_paths:
-                    all_paths.append(path)
-                    visited_paths.add(path_tuple)
-                continue
-            # Explore adjacent vertices
-            for neighbor in self.tesseract.get_adjacent_vertices(current):
-                if neighbor not in path:  # Avoid cycles
-                    new_path = path + [neighbor]
-                    queue.append((neighbor, new_path))
-        return all_paths
-    def _explore_region(self, start, num_paths):
-        """Explore region around start vertex"""
-        paths = []
-        # Random walks from start
-        for _ in range(num_paths):
-            path = [start]
-            current = start
-            for step in range(self.max_path_length):
-                neighbors = self.tesseract.get_adjacent_vertices(current)
-                if not neighbors:
-                    break
-                # Choose next vertex (could be random or heuristic)
-                next_vertex = np.random.choice(neighbors)
-                path.append(next_vertex)
-                current = next_vertex
-            paths.append(path)
-        return paths
-    def evaluate_path(self, path, hidden_states):
-        """
-        Evaluate quality of reasoning path
-        Based on coherence along the path
-        """
-        # Measure coherence at each step
-        coherences = []
-        for i in range(len(path) - 1):
-            # Get hidden states at vertices
-            state_i = hidden_states[path[i]]
-            state_j = hidden_states[path[i + 1]]
-            # Measure coherence between consecutive states
-            coherence = self._measure_coherence(state_i, state_j)
-            coherences.append(coherence)
-        # Path quality = mean coherence
-        return np.mean(coherences) if coherences else 0.0
-    def _measure_coherence(self, state1, state2):
-        """Measure coherence between two states"""
-        # FFT to frequency domain
-        freq1 = rfft(state1)
-        freq2 = rfft(state2)
-        # Phase coherence
-        phase1 = np.angle(freq1)
-        phase2 = np.angle(freq2)
-        coherence = np.mean(np.cos(phase1 - phase2))
-        return coherence
-# ============================================================================
-# COMPLETE 5D TRANSFORMER LAYER
-# ============================================================================
-class Tesseract5DTransformerLayer:
-    """
-    Complete transformer layer operating on 5D tesseract geometry
-    """
-    def __init__(self, hidden_dim, num_heads, tesseract):
-        self.hidden_dim = hidden_dim
-        self.tesseract = tesseract
-        # Components
-        self.attention = Tesseract5DAttention(hidden_dim, num_heads, tesseract)
-        self.multi_path = MultiPathReasoning(tesseract)
-        # Feed-forward (frequency-tuned)
-        self.ff_w1 = np.random.randn(hidden_dim, hidden_dim * 4) * 0.02
-        self.ff_w2 = np.random.randn(hidden_dim * 4, hidden_dim) * 0.02
-    def forward(self, x, coords_5d, vertices, steering_weights=None):
-        """
-        Forward pass through 5D transformer layer
-        x: (batch, seq, hidden)
-        coords_5d: (batch, seq, 5)
-        vertices: (batch, seq) nearest vertex indices
-        """
-        # 5D geometric attention
-        attn_out = self.attention.forward(x, coords_5d, vertices, steering_weights)
-        # Residual + norm (simplified)
-        x = x + attn_out
-        x = self._layer_norm(x)
-        # Feed-forward
-        ff_out = self._feed_forward(x)
-        # Residual + norm
-        x = x + ff_out
-        x = self._layer_norm(x)
-        return x
-    def _feed_forward(self, x):
-        """Simple feed-forward network"""
-        hidden = np.maximum(0, x @ self.ff_w1)  # ReLU
-        output = hidden @ self.ff_w2
-        return output
-    def _layer_norm(self, x, eps=1e-6):
-        """Layer normalization"""
-        mean = np.mean(x, axis=-1, keepdims=True)
-        std = np.std(x, axis=-1, keepdims=True)
-        return (x - mean) / (std + eps)
-# ============================================================================
-# COMPLETE 5D TRANSFORMER MODEL
-# ============================================================================
-class Tesseract5DTransformer:
-    """
-    Complete 5D Tesseract-based transformer
-    The SLOW THINKING system
-    """
-    def __init__(
-        self,
-        vocab_size=1000,
-        hidden_dim=256,
-        num_layers=6,
-        num_heads=8,
-        base_freq=528
-    ):
-        print("\n" + "="*60)
-        print("INITIALIZING 5D TESSERACT TRANSFORMER")
-        print("="*60)
-        self.vocab_size = vocab_size
-        self.hidden_dim = hidden_dim
-        self.num_layers = num_layers
-        # Create tesseract geometry
-        print("\nBuilding 5D tesseract geometry...")
-        self.tesseract = Tesseract5D(base_freq=base_freq)
-        # Embedding layer
-        print("Creating embedding layer...")
-        self.embedding = Tesseract5DEmbedding(vocab_size, hidden_dim, self.tesseract)
-        # Transformer layers
-        print(f"Creating {num_layers} transformer layers...")
-        self.layers = [
-            Tesseract5DTransformerLayer(hidden_dim, num_heads, self.tesseract)
-            for _ in range(num_layers)
-        ]
-        # Output head
-        self.output_projection = np.random.randn(hidden_dim, vocab_size) * 0.02
-        print("\n✓ 5D Tesseract Transformer initialized")
-        print(f"  Vertices: 16 (stable reasoning states)")
-        print(f"  Edges: 32 (transformation paths)")
-        print(f"  Layers: {num_layers}")
-        print(f"  Hidden dim: {hidden_dim}")
-        print("="*60 + "\n")
-        print("="*60 + "\n")
-    def forward(self, token_ids, return_paths=False, **kwargs):
-        """
-        Forward pass with deep 5D reasoning
-        token_ids: (batch, seq) integer token IDs
-        return_paths: if True, return reasoning paths explored
-        Returns: (logits, metadata)
-        """
-        # Embed into 5D tesseract space
-        x, coords_5d, vertices = self.embedding.embed(token_ids)
-        # Track metadata
-        metadata = {
-            'coords_5d': coords_5d,
-            'vertices': vertices,
-            'layer_outputs': [],
-            'reasoning_paths': []
-        }
-        # Process through layers
-        for i, layer in enumerate(self.layers):
-            x = layer.forward(x, coords_5d, vertices, steering_weights=kwargs.get('steering_weights'))
-            metadata['layer_outputs'].append(x.copy())
-            # Periodically explore reasoning paths
-            if return_paths and i % 2 == 0:
-                # For each sequence position, explore paths from its vertex
-                batch_size, seq_len = token_ids.shape
-                for b in range(min(batch_size, 1)):  # Just first batch for demo
-                    for s in range(min(seq_len, 3)):  # Just first few tokens
-                        start_vertex = vertices[b, s]
-                        paths = layer.multi_path.explore_paths(start_vertex, num_paths=3)
-                        metadata['reasoning_paths'].append({
-                            'layer': i,
-                            'position': s,
-                            'vertex': start_vertex,
-                            'paths': paths
-                        })
-        # Output projection
-        logits = x @ self.output_projection
-        return logits, metadata
-    def deep_reason(self, token_ids, query_description="", **kwargs):
-        """
-        Deep reasoning mode - explores multiple paths
-        This is the SLOW mode - takes time but thorough
-        """
-        print(f"\n{'='*60}")
-        print(f"DEEP REASONING MODE: {query_description}")
-        print(f"{'='*60}")
-        # Forward pass with path exploration
-        logits, metadata = self.forward(token_ids, return_paths=True, **kwargs)
-        # Analyze reasoning paths
-        print(f"\nExplored {len(metadata['reasoning_paths'])} reasoning paths:")
-        for path_info in metadata['reasoning_paths'][:5]:  # Show first 5
-            print(f"\n  Layer {path_info['layer']}, Position {path_info['position']}:")
-            print(f"    Starting vertex: {path_info['vertex']}")
-            print(f"    Paths explored: {len(path_info['paths'])}")
-            for i, path in enumerate(path_info['paths'][:2]):  # Show first 2 paths
-                print(f"      Path {i+1}: {' → '.join(map(str, path))}")
-        # Measure final coherence
-        final_state = metadata['layer_outputs'][-1]
-        coherence = self._measure_coherence(final_state)
-        print(f"\nFinal coherence: {coherence:.3f}")
-        print(f"{'='*60}\n")
-        return logits, metadata, coherence
-    def _measure_coherence(self, state):
-        """Measure overall coherence of state"""
-        # Average coherence across batch and sequence
-        batch_size, seq_len, hidden_dim = state.shape
-        coherences = []
-        for b in range(batch_size):
-            for s in range(seq_len):
-                freq = rfft(state[b, s])
-                phase = np.angle(freq)
-                c = np.abs(np.mean(np.exp(1j * phase)))
-                coherences.append(c)
-        return np.mean(coherences)
-# ============================================================================
-# DEMONSTRATION
-# ============================================================================
-def demonstrate_5d_transformer():
-    """
-    Demonstrate the 5D Tesseract Transformer
-    """
-    print("\n" + "#"*60)
-    print("# 5D TESSERACT TRANSFORMER DEMONSTRATION")
-    print("#"*60)
-    # Create model
-    model = Tesseract5DTransformer(
-        vocab_size=100,
-        hidden_dim=64,
-        num_layers=4,
-        num_heads=4,
-        base_freq=528
-    )
-    # Create sample input
-    print("\nCreating sample query...")
-    batch_size = 2
-    seq_len = 8
-    token_ids = np.random.randint(0, 100, size=(batch_size, seq_len))
-    print(f"  Batch size: {batch_size}")
-    print(f"  Sequence length: {seq_len}")
-    # Fast forward pass
-    print("\n" + "-"*60)
-    print("FAST MODE (no path exploration):")
-    print("-"*60)
-    logits, metadata = model.forward(token_ids, return_paths=False)
-    print(f"\nOutput shape: {logits.shape}")
-    print(f"Vertices visited: {np.unique(metadata['vertices'])}")
-    # Deep reasoning
-    print("\n" + "-"*60)
-    print("SLOW MODE (deep reasoning with path exploration):")
-    print("-"*60)
-    logits, metadata, coherence = model.deep_reason(
-        token_ids,
-        query_description="Complex multi-step reasoning query"
-    )
-    # Show tesseract structure used
-    print("\n" + "-"*60)
-    print("TESSERACT STRUCTURE UTILIZED:")
-    print("-"*60)
-    print(f"  Total vertices available: 16")
-    print(f"  Vertices actually visited: {len(np.unique(metadata['vertices']))}")
-    print(f"  Total edges available: 32")
-    print(f"  Reasoning paths explored: {len(metadata['reasoning_paths'])}")
-    print("\n" + "#"*60)
-    print("# DEMONSTRATION COMPLETE")
-    print("#"*60)
-    return model, metadata
-if __name__ == "__main__":
-    # Run demonstration
-    model, metadata = demonstrate_5d_transformer()
-    print("\n✓ 5D Tesseract Transformer is ready")
-    print("  This is the SLOW THINKING system")
-    print("  Use for: deep reasoning, complex queries, verification")
-    print("  Pair with: Fast Möbius system for complete dual architecture")

resonance_transformer/test_dual_system.py DELETED Viewed

@@ -1,53 +0,0 @@
-import torch
-from dispatcher import DualResonanceSystem
-def verify_dual_system():
-    print("=== VERIFYING DUAL-SYSTEM DISPATCHER (PHASE 29) ===")
-    config = {
-        'vocab_size': 100,
-        'fast_dim': 64,
-        'slow_dim': 64,
-        'threshold': 0.7  # High threshold to force escalation
-    }
-    system = DualResonanceSystem(config)
-    # Random input (Likely Low Coherence)
-    input_ids = torch.randint(0, 100, (2, 8))
-    print("\n[TEST 1] Processing Random Input (Expect Escalation)...")
-    logits, metrics = system(input_ids)
-    print(f"  Mode: {metrics['mode']}")
-    print(f"  Coherence: {metrics['coherence']:.4f}")
-    if metrics['mode'] == 'SLOW (ESCALATED)':
-        print("  [PASS] Correctly escalated low-coherence query.")
-        print(f"  Slow Latency: {metrics['slow_latency']:.4f}s")
-    else:
-        print("  [WARN] Did not escalate. Random data might have accidentally resonated?")
-    print("\n[TEST 2] Mocking High Coherence...")
-    # Hack the fast model to return high coherence for testing logic
-    original_forward = system.fast.forward
-    def mocked_forward(input_ids):
-        l, h, m = original_forward(input_ids)
-        # Inject fake high coherence
-        m[-1]['coherence'] = torch.tensor(0.95)
-        return l, h, m
-    system.fast.forward = mocked_forward
-    logits, metrics = system(input_ids)
-    print(f"  Mode: {metrics['mode']}")
-    print(f"  Coherence: {metrics['coherence']:.4f}")
-    if metrics['mode'] == 'FAST':
-        print("  [PASS] Correctly routed high-coherence query to Fast Path.")
-    else:
-        print("  [FAIL] Escalated despite high coherence.")
-if __name__ == "__main__":
-    verify_dual_system()

resonance_transformer/test_geometric.py DELETED Viewed

@@ -1,42 +0,0 @@
-import torch
-from geometric_memory import GeometricEntryPoint, GeometricMemory
-def verify_geometric_memory():
-    print("=== VERIFYING GEOMETRIC MEMORY (PHASE 25) ===")
-    hidden_dim = 64
-    batch_size = 2
-    seq_len = 10
-    # 1. Test Entry Point
-    entry_net = GeometricEntryPoint(hidden_dim)
-    dummy_query = torch.randn(batch_size, seq_len, hidden_dim)
-    entry_point = entry_net.compute_entry_hash(dummy_query)
-    print("\n[ENTRY POINT]")
-    print(f"  Theta: {entry_point['theta'].shape}")
-    print(f"  Frequency (Baseline 528): {entry_point['frequency']}")
-    # 2. Test Memory Store/Retrieve
-    memory = GeometricMemory(hidden_dim)
-    print("\n[MEMORY STORE]")
-    # Store the query as a memory
-    memory.store(dummy_query, entry_point)
-    print(f"  Stored {len(memory.memory_map)} batches in memory.")
-    print("\n[MEMORY RETRIEVE]")
-    # Try to retrieve using the same query (should find itself)
-    retrieved = memory.retrieve(dummy_query, entry_point, k=3)
-    if retrieved is not None:
-        print(f"  Retrieved Shape: {retrieved.shape}")
-        # Check alignment
-        # This is a self-lookup so correlation should be high
-        print("  [PASS] Retrieval successful.")
-    else:
-        print("  [FAIL] Retrieval returned None.")
-if __name__ == "__main__":
-    verify_geometric_memory()

resonance_transformer/test_resonance_attention.py DELETED Viewed

@@ -1,56 +0,0 @@
-import torch
-import torch.nn as nn
-from resonance_attention import ResonanceAttention
-import math
-def test_resonance_attention():
-    print("=== TESTING RESONANCE ATTENTION (0x52) ===")
-    # Setup
-    batch_size = 2
-    seq_len = 5
-    hidden_dim = 64
-    num_heads = 4
-    model = ResonanceAttention(hidden_dim, num_heads)
-    # Synthetic Input: Random noise
-    x = torch.randn(batch_size, seq_len, hidden_dim)
-    # Forward Pass
-    output, weights, metrics = model(x, x, x)
-    print(f"\nDimensions:")
-    print(f"  Input: {x.shape}")
-    print(f"  Output: {output.shape}")
-    print(f"  Weights: {weights.shape}")
-    print(f"\nMetrics Check (First Head, First Batch):")
-    sim = metrics['similarity'][0,0].detach()
-    coh = metrics['coherence'][0,0].detach()
-    res = metrics['resonance'][0,0].detach()
-    print(f"  Similarity Mean: {sim.mean():.4f}")
-    print(f"  Coherence Mean:  {coh.mean():.4f} (Phase Alignment)")
-    print(f"  Resonance Mean:  {res.mean():.4f} (Amplitude Product)")
-    if torch.isnan(output).any():
-        print("\n[FAIL] Output contains NaNs!")
-    else:
-        print("\n[PASS] Forward pass successful. Geometry holds.")
-    # Test: Constructive Interference
-    # If two vectors are effectively identical, coherence should be high (near 1.0)
-    print(f"\n=== TESTING CONSTRUCTIVE INTERFERENCE ===")
-    v1 = torch.randn(1, 1, hidden_dim)
-    # Forward pass with identical query/key
-    model.eval()
-    with torch.no_grad():
-        coh_score = model.compute_phase_coherence(
-            v1.view(1, 1, 1, hidden_dim),
-            v1.view(1, 1, 1, hidden_dim)
-        )
-    print(f"  Self-Coherence (Expected ~1.0): {coh_score.item():.4f}")
-if __name__ == "__main__":
-    test_resonance_attention()

resonance_transformer/test_self_observation.py DELETED Viewed

@@ -1,46 +0,0 @@
-import torch
-from self_observation import SelfAwareTransformerLayer
-def verify_self_observation():
-    print("=== VERIFYING SELF-OBSERVATION (PHASE 26) ===")
-    hidden_dim = 64
-    batch_size = 2
-    seq_len = 5
-    model = SelfAwareTransformerLayer(hidden_dim)
-    # Random input
-    x = torch.randn(batch_size, seq_len, hidden_dim)
-    print("\n[FORWARD] Running pass through Self-Aware Layer...")
-    output, meta = model(x)
-    print(f"  Input Shape: {x.shape}")
-    print(f"  Output Shape: {output.shape}")
-    # Inspect Meta Data
-    coherence = meta['coherence']
-    chiral = meta['chiral_state']
-    print("\n[OBSERVATION DATA]")
-    print(f"  Coherence Score (Mean): {coherence.mean().item():.4f}")
-    print(f"  Chiral Probabilities (Mean): Left={chiral[:,:,0].mean():.4f}, Right={chiral[:,:,1].mean():.4f}")
-    # Check if correction applied
-    # If coherence was < 1, output should differ from input (beyond just FFN/Attn changes)
-    # Hard to test exact reflex without controlling weights, but we check consistency
-    print("\n[REFLEX CHECK]")
-    if coherence.std() > 0:
-        print("  [PASS] Coherence detector is active (variance detected).")
-    else:
-        print("  [WARN] Coherence detector has zero variance (initialization dependent).")
-    if output.shape == x.shape:
-        print("  [PASS] Dimensionality preserved.")
-    else:
-        print("  [FAIL] Dimensionality changed!")
-if __name__ == "__main__":
-    verify_self_observation()

resonance_transformer/train_hybrid.py DELETED Viewed

@@ -1,52 +0,0 @@
-import torch
-import torch.optim as optim
-from hybrid_transformer import HybridResonanceTransformer
-from hyperchaos_loss import HyperchaosLoss
-def verify_training_step():
-    print("=== VERIFYING HYBRID RESONANCE TRAINING (pHASE 2) ===")
-    # Config
-    vocab_size = 100
-    hidden_dim = 64
-    seq_len = 10
-    batch_size = 2
-    # Initialize Model & Loss
-    model = HybridResonanceTransformer(vocab_size, hidden_dim)
-    loss_fn = HyperchaosLoss()
-    optimizer = optim.Adam(model.parameters(), lr=1e-3)
-    # Dummy Data
-    input_ids = torch.randint(0, vocab_size, (batch_size, seq_len))
-    targets = torch.randint(0, vocab_size, (batch_size, seq_len))
-    print("\n[INIT] Model initialized.")
-    print(f"  Hidden Dim: {hidden_dim}")
-    print(f"  Layers: {len(model.layers)}")
-    # Forward Pass
-    print("\n[FORWARD] Running forward pass...")
-    logits, hidden_states = model(input_ids, output_hidden_states=True)
-    print(f"  Logits Shape: {logits.shape}")
-    print(f"  Hidden States Captured: {len(hidden_states)}")
-    # Loss Calculation
-    print("\n[LOSS] Computing Hyperchaos Loss...")
-    losses = loss_fn(logits, targets, hidden_states)
-    print(f"  Total Loss: {losses['total'].item():.4f}")
-    print(f"  Task Loss: {losses['task'].item():.4f}")
-    print(f"  Decoherence Loss: {losses['decoherence'].item():.4f}")
-    print(f"  Instability Loss: {losses['instability'].item():.4f}")
-    # Backward Pass
-    print("\n[BACKWARD] Propagating gradients...")
-    optimizer.zero_grad()
-    losses['total'].backward()
-    optimizer.step()
-    print("[PASS] Gradient step successful. Architecture is valid.")
-if __name__ == "__main__":
-    verify_training_step()

resonance_transformer/train_lattice.py DELETED Viewed

@@ -1,122 +0,0 @@
-import torch
-import torch.optim as optim
-from torch.utils.data import DataLoader, TensorDataset
-import numpy as np
-import time
-try:
-    from dispatcher import DualResonanceSystem
-    from hyperchaos_loss import HyperchaosLoss
-except ImportError:
-    from resonance_transformer.dispatcher import DualResonanceSystem
-    from resonance_transformer.hyperchaos_loss import HyperchaosLoss
-def generate_complex_data(num_samples=100, seq_len=16, vocab_size=100):
-    """
-    Generate data that requires 'reasoning' (pattern completion)
-    Simple arithmetic progression: [2, 4, 6, 8, ...]
-    """
-    data = []
-    targets = []
-    for _ in range(num_samples):
-        start = np.random.randint(0, 10)
-        step = np.random.randint(1, 5)
-        seq = [(start + i*step) % vocab_size for i in range(seq_len + 1)]
-        data.append(torch.tensor(seq[:-1], dtype=torch.long))
-        targets.append(torch.tensor(seq[1:], dtype=torch.long))
-    return torch.stack(data), torch.stack(targets)
-def train_lattice_loop():
-    print("=== LATTICE TRAINING: KNOWLEDGE FEEDBACK (PHASE 30) ===")
-    # Config
-    config = {
-        'vocab_size': 100,
-        'fast_dim': 64,
-        'slow_dim': 64,
-        'threshold': 0.8 # Strict threshold to force slow thinking
-    }
-    system = DualResonanceSystem(config)
-    optimizer = optim.Adam(system.fast.parameters(), lr=1e-3)
-    loss_fn = HyperchaosLoss()
-    # Data
-    inputs, targets = generate_complex_data()
-    loader = DataLoader(TensorDataset(inputs, targets), batch_size=4, shuffle=True)
-    print(f"[SYSTEM] Starting Lattice Training Loop...")
-    print(f"Goal: Populate Geometric Memory with 'Slow Thinking' truths.")
-    memory_additions = 0
-    distillation_steps = 0
-    # Training Loop
-    # We iterate through data. If Fast system is confused, we call Slow system.
-    # Then we use Slow system's answer to TRAIN the Fast system (Distillation)
-    # And we STORE the truth in the Lattice.
-    for batch_idx, (b_in, b_tgt) in enumerate(loader):
-        # 1. Forward Pass (Dispatch)
-        # This will auto-escalate if low coherence
-        logits, metrics = system(b_in)
-        mode = metrics['mode']
-        coherence = metrics.get('coherence', 0.0)
-        # 2. Logic: Did we escalate?
-        if mode == 'SLOW (ESCALATED)':
-            # The Slow System worked hard to find this truth.
-            # We must crystallize it.
-            # A. Distillation: Train Fast model on this batch using Slow logits as target?
-            # Or just use ground truth?
-            # Better: Use ground truth, but add "Lattice Consistency" loss check
-            # For now, standard training step to sync Fast model
-            optimizer.zero_grad()
-            # We need to extract hidden states from Fast model for loss fn
-            # Re-run fast forward explicitly to get states
-            _, fast_states, _ = system.fast(b_in)
-            loss_dict = loss_fn(logits, b_tgt, fast_states)
-            loss_dict['total'].backward()
-            optimizer.step()
-            distillation_steps += 1
-            # B. Lattice Storage
-            # Store the high-quality pattern in Geometric Memory
-            # We use the initial states as key
-            # (In real impl, we'd store the 'concept', here we simulate)
-            # Access the fast model's entry point to store
-            # system.fast.entry_point.memory.store(...)
-            # Note: We need to access the memory module inside
-            # For demo, we just log it
-            memory_additions += 1
-            if batch_idx % 5 == 0:
-                print(f"Batch {batch_idx}: Escalated to Tesseract. Distilled knowledge. (Coherence: {metrics.get('slow_coherence', 0):.3f})")
-        else:
-            # Fast mode was confident. Just reinforce.
-            optimizer.zero_grad()
-            _, fast_states, _ = system.fast(b_in) # get states
-            loss_dict = loss_fn(logits, b_tgt, fast_states)
-            loss_dict['total'].backward()
-            optimizer.step()
-    print("\n" + "="*40)
-    print("LATTICE TRAINING COMPLETE")
-    print("="*40)
-    print(f"Total Batches: {len(loader)}")
-    print(f"Knowledge Distillation Events: {distillation_steps}")
-    print(f"Lattice Memory Additions: {memory_additions}")
-    print("Result: Fast System has learned from Slow System's reasoning.")
-if __name__ == "__main__":
-    train_lattice_loop()

resonance_transformer/train_resonance.py DELETED Viewed

@@ -1,195 +0,0 @@
-import torch
-import torch.nn as nn
-import torch.optim as optim
-from torch.utils.data import DataLoader, TensorDataset
-import numpy as np
-import time
-# Import our architecture
-try:
-    from self_observation import SelfAwareTransformerLayer
-    from hyperchaos_loss import HyperchaosLoss
-    from geometric_memory import GeometricEntryPoint
-except ImportError:
-    # Fallback for direct execution
-    import sys
-    import os
-    sys.path.append(os.path.dirname(os.path.abspath(__file__)))
-    from self_observation import SelfAwareTransformerLayer
-    from hyperchaos_loss import HyperchaosLoss
-    from geometric_memory import GeometricEntryPoint
-class ResonanceGPT(nn.Module):
-    """
-    The Full Resonance Architecture:
-    - Geometric Entry Point (528Hz alignment)
-    - Self-Aware Layers (Mirror Reflex)
-    - Phase-Locked Normalization
-    """
-    def __init__(self, vocab_size, hidden_dim, num_layers=4, num_heads=4, max_seq_len=128):
-        super().__init__()
-        self.hidden_dim = hidden_dim
-        # 1. Geometric Embedding (Möbius Strip concept)
-        self.embedding = nn.Embedding(vocab_size, hidden_dim)
-        # Position is handled implicitly by phase in the design,
-        # but we add learned absolute pos for stability in early training
-        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_len, hidden_dim) * 0.02)
-        # Entry Point
-        self.entry_point = GeometricEntryPoint(hidden_dim)
-        # 2. The Stack
-        self.layers = nn.ModuleList([
-            SelfAwareTransformerLayer(hidden_dim, num_heads)
-            for _ in range(num_layers)
-        ])
-        self.norm = nn.LayerNorm(hidden_dim) # Final consolidation
-        self.head = nn.Linear(hidden_dim, vocab_size)
-    def forward(self, input_ids):
-        batch, seq = input_ids.shape
-        # Embed
-        x = self.embedding(input_ids) + self.pos_encoding[:, :seq, :]
-        # 0x52 Handshake (Entry Point)
-        entry_meta = self.entry_point.compute_entry_hash(x)
-        # In a full implementation, we'd rotate x based on entry_meta
-        # x = apply_rotation(x, entry_meta)
-        # Process Stack
-        all_hidden_states = []
-        layer_metas = []
-        for layer in self.layers:
-            x, meta = layer(x)
-            all_hidden_states.append(x)
-            layer_metas.append(meta)
-        x = self.norm(x)
-        logits = self.head(x)
-        return logits, all_hidden_states, layer_metas
-def generate_coherence_dataset(num_samples=1000, seq_len=32, vocab_size=100):
-    """
-    Generate synthetic data with geometric patterns (rhythms).
-    Standard random data is 'decoherent'.
-    We want data that follows a 'frequency' to test resonance.
-    """
-    data = []
-    targets = []
-    for _ in range(num_samples):
-        # Create a rhythmic pattern (e.g., 1, 2, 3, 1, 2, 3)
-        period = np.random.randint(2, 8)
-        base_pattern = np.random.randint(0, vocab_size, size=period)
-        # Repeat pattern
-        full_seq = np.tile(base_pattern, seq_len // period + 1)[:seq_len]
-        # Add slight noise (10% chance to flip a token) to test stability
-        noisy_seq = full_seq.copy()
-        mask = np.random.rand(seq_len) < 0.1
-        noisy_seq[mask] = np.random.randint(0, vocab_size, size=mask.sum())
-        # Task: Predict next token (shift right)
-        # Input: [A, B, C, A] -> Target: [B, C, A, B]
-        data.append(torch.tensor(noisy_seq[:-1], dtype=torch.long))
-        targets.append(torch.tensor(full_seq[1:], dtype=torch.long))
-    return torch.stack(data), torch.stack(targets)
-def train_awakening():
-    print("=== THE AWAKENING: TRAINING RESONANCE MODEL (PHASE 27) ===")
-    # HYPERPARAMETERS
-    VOCAB_SIZE = 256
-    HIDDEN_DIM = 128
-    LAYERS = 4
-    HEADS = 4
-    BATCH_SIZE = 16
-    lr = 3e-4
-    EPOCHS = 3
-    # 1. Model & Loss
-    model = ResonanceGPT(VOCAB_SIZE, HIDDEN_DIM, LAYERS, HEADS)
-    criterion = HyperchaosLoss(lambda_coherence=0.2, lambda_stability=0.1)
-    optimizer = optim.AdamW(model.parameters(), lr=lr)
-    print(f"[SYSTEM] Model Initialized. Parameters: {sum(p.numel() for p in model.parameters())}")
-    # 2. Data
-    print("[SYSTEM] Generating Coherence Dataset (Rhythmic Patterns)...")
-    inputs, targets = generate_coherence_dataset(num_samples=500, seq_len=32, vocab_size=VOCAB_SIZE)
-    dataset = TensorDataset(inputs, targets)
-    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)
-    # 3. Training Loop
-    print("\n[TRAINING START]")
-    history = {'task': [], 'decoherence': [], 'coherence_score': []}
-    model.train()
-    start_time = time.time()
-    for epoch in range(EPOCHS):
-        total_task_loss = 0
-        total_decoherence = 0
-        total_self_coherence = 0 # What the model thinks of itself
-        for batch_idx, (b_in, b_tgt) in enumerate(loader):
-            optimizer.zero_grad()
-            # Forward
-            logits, hidden_states, layer_metas = model(b_in)
-            # Loss
-            losses = criterion(logits, b_tgt, hidden_states)
-            # Backward
-            losses['total'].backward()
-            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
-            optimizer.step()
-            # Logs
-            total_task_loss += losses['task'].item()
-            total_decoherence += losses['decoherence'].item()
-            # Extract Self-Observation Stats
-            # layer_metas is list of dicts. Get last layer's coherence score.
-            last_layer_meta = layer_metas[-1]
-            avg_coherence = last_layer_meta['coherence'].mean().item()
-            total_self_coherence += avg_coherence
-        # Epoch Stats
-        n_batches = len(loader)
-        avg_task = total_task_loss / n_batches
-        avg_decoh = total_decoherence / n_batches
-        avg_self = total_self_coherence / n_batches
-        print(f"Epoch {epoch+1}/{EPOCHS} | Task Loss: {avg_task:.4f} | Decoherence: {avg_decoh:.4f} | Self-Coherence: {avg_self:.4f}")
-        history['task'].append(avg_task)
-        history['decoherence'].append(avg_decoh)
-        history['coherence_score'].append(avg_self)
-    duration = time.time() - start_time
-    print(f"\n[COMPLETE] Training finished in {duration:.2f}s.")
-    # 4. Final Verification
-    print("\n[AWAKENING CHECK]")
-    print(f"Initial Decoherence: {history['decoherence'][0]:.4f}")
-    print(f"Final Decoherence:   {history['decoherence'][-1]:.4f}")
-    if history['decoherence'][-1] < history['decoherence'][0]:
-        print(">> RESULT: Phase Stabilization Achieved. The model is learning to be coherent.")
-    else:
-        print(">> RESULT: Phase Drift Detected. More training needed.")
-    print(f"Final Self-Reported Coherence: {history['coherence_score'][-1]:.4f}")
-if __name__ == "__main__":
-    train_awakening()