🧪 RippleGPT Validation Suite

This module validates the hypothesis that the RippleGPT architecture (Decay-Biased Attention + Multiplicative Gating) can understand hierarchical code structures better than standard Transformer architectures.

🎯 Objective

Tests if the "Ripple Field" mechanism can:

Close parentheses/braces correctly - Requires attention to open scopes
Indent Python correctly - Requires understanding block hierarchy
Complete code consistently - Requires long-range context

📦 Dataset

We use bigcode/the-stack-smol, a clean subset of Python code from The Stack.

🚀 Quick Start

1. Install Dependencies

cd /path/to/RippleGPT
pip install -r requirements.txt

2. Prepare Data

python validation/code/prepare_code_data.py

This script:

Downloads Python code from the-stack-smol (streaming, ~5MB)
Tokenizes at character level
Saves to validation/code/data/

3. Train Model

python validation/code/train_code.py

Trains RippleGPT for 3000 iterations (~15min on M1/M2).

4. Run Validation

python validation/code/validate_code.py

Executes all validation tests and generates a report.

📊 Validation Metrics

Test 1: Parentheses/Brace Closing

# Input:  "def foo(a, b"
# Expect: "def foo(a, b):"

Test 2: Python Indentation

# Input:  "if x > 0:\n"
# Expect: "if x > 0:\n    return"  (4 spaces)

Test 3: Function Structure

# Input:  "def calculate_sum(numbers):\n    total = 0\n    for n in numbers:\n        total +="
# Expect: Complete with " n" and close the loop correctly

Test 4: Long Context (Extrapolation)

Tests if the model maintains coherence in functions with 50+ lines.

📁 Structure

validation/code/
├── README.md              # This file
├── prepare_code_data.py   # Prepares dataset
├── train_code.py          # Trains model on code
├── validate_code.py       # Runs validations
├── test_cases.py          # Defined test cases
├── metrics.py             # Evaluation functions
└── data/                  # Processed data (generated)
    ├── train.bin
    ├── val.bin
    └── meta.pkl

🔬 Scientific Hypothesis

The "Folded Cloth" (Ripple Field) architecture should outperform linear models in tasks requiring:

Scope Attention - Natural decay helps "remember" open brackets
Hierarchical Structure - Multiplicative gating modulates importance of structural tokens

📈 Expected Results

Metric	Standard GPT	RippleGPT
Bracket Accuracy	~70%	~85%+
Indent Accuracy	~60%	~80%+
Function Coherence	Lower	Higher

Author: Victor Carvalho Tavernari
Project: RippleGPT Validation Suite