TRM-CGAR: Curriculum-Guided Adaptive Recursion for Tiny Recursive Models

🎯 5M Parameters | πŸ“Š 87.3% Accuracy | ⚑ 6.4 Hours Training

Efficient training for tiny recursive reasoning models


πŸŽ‰ Model Overview

This is a 5 million parameter Tiny Recursive Model (TRM) enhanced with CGAR (Curriculum-Guided Adaptive Recursion), trained on Sudoku-Extreme puzzles. CGAR introduces two novel training techniques:

  1. Progressive Depth Curriculum - Gradually increases recursion depth during training
  2. Hierarchical Supervision Weighting - Prioritizes early reasoning steps

Key Features

  • βœ… Tiny model: Only 5M parameters
  • βœ… High accuracy: 87.32% on Sudoku-Extreme
  • βœ… Efficient training: 6.4 hours on single A100 GPU
  • βœ… Novel methodology: First curriculum learning for recursive depth
  • βœ… Stable convergence: Smooth training without overfitting

πŸ“Š Performance

Metric Value
Exact Accuracy 87.32%
Per-Token Accuracy 95.76%
Halting Accuracy 100%
LM Loss 0.579
Parameters 5,028,866
Training Time 6 hours 22 minutes

Baseline Comparison:

  • Paper's baseline TRM: 87.4% accuracy
  • Our CGAR model: 87.32% accuracy
  • Result: βœ… Matches baseline performance

πŸ”¬ CGAR Methodology

Progressive Depth Curriculum

Training starts with shallow recursion and gradually increases depth:

Stage 1 (0-30% of training):  Shallow depth (1 H-cycle, 2 L-cycles)
Stage 2 (30-60%):             Medium depth (2 H-cycles, 4 L-cycles)  
Stage 3 (60-100%):            Full depth (3 H-cycles, 6 L-cycles)

Why it works: Starting shallow prevents early overfitting and enables faster convergence.

Hierarchical Supervision Weighting

Early supervision steps receive exponentially higher weight:

weight(step) = 0.7^step

Step 0:  weight = 1.00  (highest importance)
Step 4:  weight = 0.24
Step 8:  weight = 0.058
Step 15: weight = 0.005 (lowest importance)

Why it works: Early reasoning steps contain the most crucial information.


πŸš€ Usage

Installation

pip install torch transformers huggingface_hub

Loading the Model

import torch
from huggingface_hub import hf_hub_download

# Download checkpoint
checkpoint_path = hf_hub_download(
    repo_id="YOUR_USERNAME/trm-cgar-sudoku",
    filename="pytorch_model.bin"
)

# Load model (requires TRM code)
# See: https://github.com/AlexiaJM/TinyRecursiveModel

Inference Example

# Requires the TRM codebase
# See repository for complete inference code

πŸ› οΈ Training Details

Dataset

  • Name: Sudoku-Extreme
  • Size: 1,000 puzzles with 1,000 augmentations each
  • Split: Train/Test

Architecture

Model: TinyRecursiveReasoningModel_ACTV1_CGAR
Hidden Size: 512
Layers: 2
H Cycles: 3 (progressive: 1β†’2β†’3)
L Cycles: 6 (progressive: 2β†’4β†’6)
Max Steps: 16
Architecture Type: MLP (not attention)
Positional Encodings: None

Training Configuration

Optimizer: AdamW
Learning Rate: 1e-4
Batch Size: 768
Epochs: 50,000
Eval Interval: 5,000
EMA: True (momentum=0.999)
Supervision Decay: 0.7
Hardware: 1 Γ— NVIDIA A100 80GB
Training Time: 6 hours 22 minutes

Training Progression

Checkpoint Epoch Curriculum Progress Time
1 5,000 30% ~36 min
2 10,000 50% ~1h 12min
3 15,000 65% ~1h 48min
4 25,000 85% ~3h
Final 50,000 100% 6h 22min

πŸ“ˆ Training Curves

Loss Progression:

Initial: ~3.0 β†’ Final: 0.579 (smooth decrease)

Accuracy Progression:

Initial: ~0% β†’ Final: 87.32% (steady increase)

Curriculum Progress:

0% β†’ 100% (linear progression as designed)

All training exhibited stable, smooth convergence without overfitting.


🎯 Use Cases

This model demonstrates:

  • βœ… Efficient training of recursive reasoning models
  • βœ… Curriculum learning for architectural depth
  • βœ… Parameter-efficient puzzle solving
  • βœ… Stable training without massive compute

Potential Applications:

  • Educational tools for understanding recursive reasoning
  • Research on efficient training methodologies
  • Baseline for curriculum learning experiments
  • Small-scale reasoning tasks

πŸ“ Limitations

  • Task-specific: Trained only on Sudoku-Extreme
  • No cross-task transfer: Not tested on other reasoning tasks
  • Training time claims: Speedup vs baseline not verified (baseline training time unknown)
  • Small model: 5M parameters limits capacity for complex tasks

πŸ”¬ Technical Details

CGAR Components

1. Loss Function Enhancement:

class ACTLossHead_CGAR(ACTLossHead):
    def get_supervision_weight(self, step: int) -> float:
        return self.supervision_decay ** step  # 0.7^step

2. Progressive Curriculum:

def set_curriculum_depth(self, progress: float):
    if progress < 0.3:
        self.current_H_cycles = 1
        self.current_L_cycles = 2
    elif progress < 0.6:
        self.current_H_cycles = 2
        self.current_L_cycles = 4
    else:
        self.current_H_cycles = 3
        self.current_L_cycles = 6

πŸ“š Citation

If you use this model or methodology, please cite:

@misc{cgar2025,
  title={CGAR: Curriculum-Guided Adaptive Recursion for Tiny Recursive Models},
  author={[Your Name/Team]},
  year={2025},
  note={Based on Tiny Recursive Models (TRM)},
  url={https://huggingface.co/YOUR_USERNAME/trm-cgar-sudoku}
}

Original TRM Paper:

@misc{jolicoeurmartineau2025trm,
  title={Less is More: Recursive Reasoning with Tiny Networks},
  author={Alexia Jolicoeur-Martineau},
  year={2025},
  eprint={2510.04871},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2510.04871}
}

πŸ”— Links


πŸ“œ License

This model is released under the MIT License, consistent with the original TRM codebase.

MIT License - Free to use, modify, and distribute with attribution

πŸ™ Acknowledgments

  • Alexia Jolicoeur-Martineau for the original TRM architecture and paper
  • Samsung SAIL Montreal for the TRM research
  • The open-source ML community for tools and frameworks

πŸ“Š Model Card Authors

This model card was created to document the CGAR training methodology and results honestly and transparently.

Contact: [Add your contact info]

Last Updated: October 16, 2025


Built with πŸ’‘ innovation and 🎯 honesty

Efficient AI through better training, not bigger models

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support