# RAE Training Methodology ## Recursive Abstraction Engine as Training-Time Cognitive Installation > **The handwriting principle**: Slow, multi-representational, generative reconstruction > during training installs richer internal representations — producing fast, effortless > retrieval at inference. The hand was slow so the mind could be fast later. --- ## Core Thesis Standard fine-tuning trains models on flat `input → output` pairs. This is **typing** — discriminative lookup from heavy context. RAE Training forces models through **multi-phase generative reconstruction**, creating the neural equivalent of handwriting: | Property | Handwriting | RAE Training | |----------|------------|--------------| | Forced sequential reconstruction | Must regenerate each letter from memory | Must generate each cognitive phase from internal state | | Multi-pathway co-firing | Motor + visual + spatial + linguistic | Saturation + abstraction + descent + integration | | Temporal bottleneck | Slowness forces deeper encoding | Multi-phase chain forces richer weight geometry | | Variability | No two handwritten letters identical | Stochastic phase generation prevents rote memorization | | Closed-loop embodiment | Proprioceptive error correction | Phase-to-phase coherence loss creates self-correction | ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ RAE TRAINING PIPELINE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │SATURATION│───►│ABSTRACTION│───►│ DESCENT │───►│INTEGRATE │ │ │ │ tokens │ │ tokens │ │ tokens │ │ tokens │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ ▲ │ │ │ └───────────────────────────────────────────────┘ │ │ │ │ Loss = λ₁·L_sat + λ₂·L_abs + λ₃·L_desc + λ₄·L_int │ │ + λ_coh·L_coherence + λ_comp·L_compression │ │ │ │ Key: ALL phases contribute to loss, not just final answer │ └─────────────────────────────────────────────────────────────────┘ ``` ## Training Objectives (Multi-Objective Co-Training) 1. **Phase Generation Loss** — Each RAE phase must be generated correctly 2. **Cross-Phase Coherence Loss** — Abstractions must logically follow from saturation 3. **Compression Loss** — Abstraction phase penalized for being longer than saturation 4. **Prediction Accuracy Loss** — Descent-phase predictions evaluated against ground truth 5. **Integration Quality Loss** — Final synthesis must incorporate phase outputs ## Quick Start ### Option A: AutoTrain (No-Code) ```bash pip install autotrain-advanced autotrain --config configs/autotrain_rae_sft.yaml ``` ### Option B: Custom Trainer (Full Control) ```bash pip install -r requirements.txt python src/train_rae.py --config configs/rae_training_config.json ``` ### Option C: HuggingFace Spaces Upload to a Space with GPU — see `scripts/deploy_to_hf_space.sh` ## Dataset Format RAE training data uses JSONL with structured multi-phase reasoning: ```json { "messages": [ {"role": "system", "content": "You are an RAE-trained reasoner..."}, {"role": "user", "content": ""}, {"role": "assistant", "content": "............"} ] } ``` ## Files ``` rae-training/ ├── README.md # This file ├── requirements.txt # Python dependencies ├── configs/ │ ├── autotrain_rae_sft.yaml # AutoTrain config (no-code path) │ ├── rae_training_config.json # Custom trainer config │ └── base_models.json # Tested base model registry ├── src/ │ ├── dataset_generator.py # Generates RAE-structured training data │ ├── rae_data_formatter.py # Formats raw data into RAE phases │ ├── train_rae.py # Custom RAE trainer with multi-phase loss │ ├── rae_loss.py # Multi-objective loss functions │ └── rae_tokenizer_utils.py # Phase-aware tokenization ├── evaluation/ │ ├── eval_rae_model.py # Evaluation harness │ └── benchmarks.json # Test problems for before/after comparison ├── data/ │ └── seed_problems.jsonl # Seed problems for dataset generation └── scripts/ ├── generate_dataset.sh # End-to-end dataset generation ├── run_training.sh # Training launcher └── deploy_to_hf_space.sh # HF Spaces deployment ``` ## Theory: Why This Works See the companion document `THEORY.md` for the full neuroscience-to-ML mapping. **TL;DR**: Handwriting activates widespread brain connectivity because it forces *generative reconstruction through multiple representational modalities simultaneously under a temporal bottleneck*. RAE training replicates this by forcing the model to traverse Saturation → Abstraction → Descent → Integration phases, with loss computed on ALL phases — meaning the model cannot shortcut to the answer. The multi-phase structure installs richer weight geometry that persists as faster, more capable inference after training.