# ๐Ÿงช RippleGPT Validation Suite This module validates the hypothesis that the **RippleGPT** architecture (Decay-Biased Attention + Multiplicative Gating) can understand **hierarchical code structures** better than standard Transformer architectures. ## ๐ŸŽฏ Objective Tests if the "Ripple Field" mechanism can: 1. **Close parentheses/braces correctly** - Requires attention to open scopes 2. **Indent Python correctly** - Requires understanding block hierarchy 3. **Complete code consistently** - Requires long-range context ## ๐Ÿ“ฆ Dataset We use [bigcode/the-stack-smol](https://huggingface.co/datasets/bigcode/the-stack-smol), a clean subset of Python code from The Stack. ## ๐Ÿš€ Quick Start ### 1. Install Dependencies ```bash cd /path/to/RippleGPT pip install -r requirements.txt ``` ### 2. Prepare Data ```bash python validation/code/prepare_code_data.py ``` This script: - Downloads Python code from the-stack-smol (streaming, ~5MB) - Tokenizes at character level - Saves to `validation/code/data/` ### 3. Train Model ```bash python validation/code/train_code.py ``` Trains RippleGPT for 3000 iterations (~15min on M1/M2). ### 4. Run Validation ```bash python validation/code/validate_code.py ``` Executes all validation tests and generates a report. ## ๐Ÿ“Š Validation Metrics ### Test 1: Parentheses/Brace Closing ```python # Input: "def foo(a, b" # Expect: "def foo(a, b):" ``` ### Test 2: Python Indentation ```python # Input: "if x > 0:\n" # Expect: "if x > 0:\n return" (4 spaces) ``` ### Test 3: Function Structure ```python # Input: "def calculate_sum(numbers):\n total = 0\n for n in numbers:\n total +=" # Expect: Complete with " n" and close the loop correctly ``` ### Test 4: Long Context (Extrapolation) Tests if the model maintains coherence in functions with 50+ lines. ## ๐Ÿ“ Structure ``` validation/code/ โ”œโ”€โ”€ README.md # This file โ”œโ”€โ”€ prepare_code_data.py # Prepares dataset โ”œโ”€โ”€ train_code.py # Trains model on code โ”œโ”€โ”€ validate_code.py # Runs validations โ”œโ”€โ”€ test_cases.py # Defined test cases โ”œโ”€โ”€ metrics.py # Evaluation functions โ””โ”€โ”€ data/ # Processed data (generated) โ”œโ”€โ”€ train.bin โ”œโ”€โ”€ val.bin โ””โ”€โ”€ meta.pkl ``` ## ๐Ÿ”ฌ Scientific Hypothesis The "Folded Cloth" (Ripple Field) architecture should outperform linear models in tasks requiring: - **Scope Attention** - Natural decay helps "remember" open brackets - **Hierarchical Structure** - Multiplicative gating modulates importance of structural tokens ## ๐Ÿ“ˆ Expected Results | Metric | Standard GPT | RippleGPT | |--------|--------------|-----------| | Bracket Accuracy | ~70% | **~85%+** | | Indent Accuracy | ~60% | **~80%+** | | Function Coherence | Lower | **Higher** | --- **Author:** Victor Carvalho Tavernari **Project:** RippleGPT Validation Suite