π§ͺ RippleGPT Validation Suite
This module validates the hypothesis that the RippleGPT architecture (Decay-Biased Attention + Multiplicative Gating) can understand hierarchical code structures better than standard Transformer architectures.
π― Objective
Tests if the "Ripple Field" mechanism can:
- Close parentheses/braces correctly - Requires attention to open scopes
- Indent Python correctly - Requires understanding block hierarchy
- Complete code consistently - Requires long-range context
π¦ Dataset
We use bigcode/the-stack-smol, a clean subset of Python code from The Stack.
π Quick Start
1. Install Dependencies
cd /path/to/RippleGPT
pip install -r requirements.txt
2. Prepare Data
python validation/code/prepare_code_data.py
This script:
- Downloads Python code from the-stack-smol (streaming, ~5MB)
- Tokenizes at character level
- Saves to
validation/code/data/
3. Train Model
python validation/code/train_code.py
Trains RippleGPT for 3000 iterations (~15min on M1/M2).
4. Run Validation
python validation/code/validate_code.py
Executes all validation tests and generates a report.
π Validation Metrics
Test 1: Parentheses/Brace Closing
# Input: "def foo(a, b"
# Expect: "def foo(a, b):"
Test 2: Python Indentation
# Input: "if x > 0:\n"
# Expect: "if x > 0:\n return" (4 spaces)
Test 3: Function Structure
# Input: "def calculate_sum(numbers):\n total = 0\n for n in numbers:\n total +="
# Expect: Complete with " n" and close the loop correctly
Test 4: Long Context (Extrapolation)
Tests if the model maintains coherence in functions with 50+ lines.
π Structure
validation/code/
βββ README.md # This file
βββ prepare_code_data.py # Prepares dataset
βββ train_code.py # Trains model on code
βββ validate_code.py # Runs validations
βββ test_cases.py # Defined test cases
βββ metrics.py # Evaluation functions
βββ data/ # Processed data (generated)
βββ train.bin
βββ val.bin
βββ meta.pkl
π¬ Scientific Hypothesis
The "Folded Cloth" (Ripple Field) architecture should outperform linear models in tasks requiring:
- Scope Attention - Natural decay helps "remember" open brackets
- Hierarchical Structure - Multiplicative gating modulates importance of structural tokens
π Expected Results
| Metric | Standard GPT | RippleGPT |
|---|---|---|
| Bracket Accuracy | ~70% | ~85%+ |
| Indent Accuracy | ~60% | ~80%+ |
| Function Coherence | Lower | Higher |
Author: Victor Carvalho Tavernari
Project: RippleGPT Validation Suite