YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

RAE Training Methodology

Recursive Abstraction Engine as Training-Time Cognitive Installation

The handwriting principle: Slow, multi-representational, generative reconstruction during training installs richer internal representations β€” producing fast, effortless retrieval at inference. The hand was slow so the mind could be fast later.


Core Thesis

Standard fine-tuning trains models on flat input β†’ output pairs. This is typing β€” discriminative lookup from heavy context. RAE Training forces models through multi-phase generative reconstruction, creating the neural equivalent of handwriting:

Property Handwriting RAE Training
Forced sequential reconstruction Must regenerate each letter from memory Must generate each cognitive phase from internal state
Multi-pathway co-firing Motor + visual + spatial + linguistic Saturation + abstraction + descent + integration
Temporal bottleneck Slowness forces deeper encoding Multi-phase chain forces richer weight geometry
Variability No two handwritten letters identical Stochastic phase generation prevents rote memorization
Closed-loop embodiment Proprioceptive error correction Phase-to-phase coherence loss creates self-correction

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  RAE TRAINING PIPELINE                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚SATURATION│───►│ABSTRACTION│───►│ DESCENT  │───►│INTEGRATE β”‚  β”‚
β”‚  β”‚  tokens  β”‚    β”‚  tokens   β”‚    β”‚  tokens  β”‚    β”‚  tokens  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚       β–²                                               β”‚         β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                                                                  β”‚
β”‚  Loss = λ₁·L_sat + Ξ»β‚‚Β·L_abs + λ₃·L_desc + Ξ»β‚„Β·L_int           β”‚
β”‚       + Ξ»_cohΒ·L_coherence + Ξ»_compΒ·L_compression                β”‚
β”‚                                                                  β”‚
β”‚  Key: ALL phases contribute to loss, not just final answer      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Training Objectives (Multi-Objective Co-Training)

  1. Phase Generation Loss β€” Each RAE phase must be generated correctly
  2. Cross-Phase Coherence Loss β€” Abstractions must logically follow from saturation
  3. Compression Loss β€” Abstraction phase penalized for being longer than saturation
  4. Prediction Accuracy Loss β€” Descent-phase predictions evaluated against ground truth
  5. Integration Quality Loss β€” Final synthesis must incorporate phase outputs

Quick Start

Option A: AutoTrain (No-Code)

pip install autotrain-advanced
autotrain --config configs/autotrain_rae_sft.yaml

Option B: Custom Trainer (Full Control)

pip install -r requirements.txt
python src/train_rae.py --config configs/rae_training_config.json

Option C: HuggingFace Spaces

Upload to a Space with GPU β€” see scripts/deploy_to_hf_space.sh

Dataset Format

RAE training data uses JSONL with structured multi-phase reasoning:

{
  "messages": [
    {"role": "system", "content": "You are an RAE-trained reasoner..."},
    {"role": "user", "content": "<problem>"},
    {"role": "assistant", "content": "<SATURATION>...</SATURATION><ABSTRACTION>...</ABSTRACTION><DESCENT>...</DESCENT><INTEGRATION>...</INTEGRATION>"}
  ]
}

Files

rae-training/
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ requirements.txt                   # Python dependencies
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ autotrain_rae_sft.yaml        # AutoTrain config (no-code path)
β”‚   β”œβ”€β”€ rae_training_config.json      # Custom trainer config
β”‚   └── base_models.json              # Tested base model registry
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ dataset_generator.py          # Generates RAE-structured training data
β”‚   β”œβ”€β”€ rae_data_formatter.py         # Formats raw data into RAE phases
β”‚   β”œβ”€β”€ train_rae.py                  # Custom RAE trainer with multi-phase loss
β”‚   β”œβ”€β”€ rae_loss.py                   # Multi-objective loss functions
β”‚   └── rae_tokenizer_utils.py        # Phase-aware tokenization
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ eval_rae_model.py             # Evaluation harness
β”‚   └── benchmarks.json               # Test problems for before/after comparison
β”œβ”€β”€ data/
β”‚   └── seed_problems.jsonl           # Seed problems for dataset generation
└── scripts/
    β”œβ”€β”€ generate_dataset.sh           # End-to-end dataset generation
    β”œβ”€β”€ run_training.sh               # Training launcher
    └── deploy_to_hf_space.sh         # HF Spaces deployment

Theory: Why This Works

See the companion document THEORY.md for the full neuroscience-to-ML mapping.

TL;DR: Handwriting activates widespread brain connectivity because it forces generative reconstruction through multiple representational modalities simultaneously under a temporal bottleneck. RAE training replicates this by forcing the model to traverse Saturation β†’ Abstraction β†’ Descent β†’ Integration phases, with loss computed on ALL phases β€” meaning the model cannot shortcut to the answer. The multi-phase structure installs richer weight geometry that persists as faster, more capable inference after training.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support