Drop stale README backup
Browse files- README.md.bak +0 -165
README.md.bak
DELETED
|
@@ -1,165 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
library_name: pytorch
|
| 3 |
-
license: mit
|
| 4 |
-
pipeline_tag: other
|
| 5 |
-
tags:
|
| 6 |
-
- arc-prize-2025
|
| 7 |
-
- program-synthesis
|
| 8 |
-
- tiny-recursive-models
|
| 9 |
-
- recursive-reasoning
|
| 10 |
-
- kaggle
|
| 11 |
-
- act
|
| 12 |
-
- reproducibility
|
| 13 |
-
datasets:
|
| 14 |
-
- arc-prize-2025
|
| 15 |
-
model-index:
|
| 16 |
-
- name: Tiny Recursive Models — ARC-AGI-2
|
| 17 |
-
results:
|
| 18 |
-
- task:
|
| 19 |
-
type: program-synthesis
|
| 20 |
-
name: ARC Prize 2025
|
| 21 |
-
dataset:
|
| 22 |
-
name: ARC Prize 2025 Public Evaluation
|
| 23 |
-
type: arc-prize-2025
|
| 24 |
-
split: evaluation
|
| 25 |
-
metrics:
|
| 26 |
-
- type: accuracy
|
| 27 |
-
name: Accuracy
|
| 28 |
-
value: 0.6283
|
| 29 |
-
- type: loss
|
| 30 |
-
name: LM Loss
|
| 31 |
-
value: 2.0186
|
| 32 |
-
- type: accuracy
|
| 33 |
-
name: Halt Accuracy
|
| 34 |
-
value: 0.9070
|
| 35 |
-
---
|
| 36 |
-
|
| 37 |
-
# Tiny Recursive Models — ARC-AGI-2 (8×GPU)
|
| 38 |
-
|
| 39 |
-
**Abstract.** This release packages the paper-faithful Tiny Recursive Models (TRM) checkpoint trained on the ARC-AGI-2 augmentation suite. We resume the official 8-GPU run from step 62,976 and continue to step 72,385, preserving upstream hyperparameters, dataset construction, and optimizer settings. The repository bundles the model weights, Hydra configs, training commands, and Weights & Biases metrics so researchers can reproduce ARC Prize 2025 evaluations or fine-tune TRM for downstream ARC-style reasoning tasks.
|
| 40 |
-
|
| 41 |
-
**Special thanks** to Shawn Lewis (CTO of Weights & Biases) and the CoreWeave team (coreweave.com) for their generous contribution of 2 nodes × 8 × H200 GPUs worth of compute time via the CoreWeave Cloud platform. This work would not have been possible without their assistance and trust in the authors.
|
| 42 |
-
|
| 43 |
-
**Note on authorship.** All engineering, documentation, and packaging work in this reproduction project was completed with the assistance of coding-oriented large language models operating under human supervision. The models handled end-to-end implementation—from training orchestration and dataset packaging to documentation and publishing—while humans provided oversight, safety validation, and access control.
|
| 44 |
-
|
| 45 |
-
## Model Summary
|
| 46 |
-
- **Architecture**: Tiny Recursive Model (TRM) with ACT V1 controller
|
| 47 |
-
`L_layers=2`, `H_cycles=3`, `L_cycles=4`, hidden size 512, 8 heads, RoPE positional encodings, bfloat16 activations.
|
| 48 |
-
- **Checkpoint**: `model.ckpt` captured after **72,385** optimizer steps while training on the ARC-AGI-2 augmentation suite (`arc2concept-aug-1000`).
|
| 49 |
-
- **Upstream Commit**: `e7b68717f0a6c4cbb4ce6fbef787b14f42083bd9` (SamsungSAILMontreal/TinyRecursiveModels).
|
| 50 |
-
- **Optimizer**: Adam-atan2 variant (`beta1=0.9`, `beta2=0.95`, `weight_decay=0.1`, global batch size 768).
|
| 51 |
-
- **License**: MIT (inherits upstream TRM license).
|
| 52 |
-
|
| 53 |
-
This release reproduces the ARC-AGI-2 configuration described in the TRM paper using the officially provided dataset builder and training recipe. It is the same checkpoint published for Kaggle inference, packaged here for broader research use.
|
| 54 |
-
|
| 55 |
-
## Files Included
|
| 56 |
-
| Path | Description |
|
| 57 |
-
| --- | --- |
|
| 58 |
-
| `model.ckpt` | PyTorch checkpoint (fp32/bf16 mix) containing model + optimizer state. |
|
| 59 |
-
| `ENVIRONMENT.txt` | Hydra-resolved configuration used for the run (mirrors `all_config.yaml`). |
|
| 60 |
-
| `COMMANDS.txt` | Launch command showing exact training flags. |
|
| 61 |
-
| `COMMANDS_resumed.txt` | Resume command showing restart from step 62,976. |
|
| 62 |
-
| `TRM_COMMIT.txt` | Git SHA for the TinyRecursiveModels source at training time. |
|
| 63 |
-
| `all_config.yaml` | Full structured config exported from the training job. |
|
| 64 |
-
| `step_72385.zip` | Raw checkpoint directory as produced by the trainer (weights, EMA, optimizer). |
|
| 65 |
-
| `wandb_ljxzfy3z_history.csv` / `wandb_ljxzfy3z_summary.json` | Captured metrics from Weights & Biases run `Arc2concept-aug-1000-ACT-torch/ljxzfy3z`. |
|
| 66 |
-
|
| 67 |
-
## Intended Use & Limitations
|
| 68 |
-
- **Primary use**: Research on ARC-AGI-style program synthesis and evaluation, benchmarking Tiny Recursive Models, and reproducing Kaggle ARC Prize 2025 submissions.
|
| 69 |
-
- **Downstream evaluation**: Pair with the official ARC Prize 2025 evaluation set or ARC-AGI-2 validation splits.
|
| 70 |
-
- **Misuse**: The checkpoint is not designed for domains outside program synthesis. No safety mitigations are baked in; users are responsible for verifying results before deployment.
|
| 71 |
-
- **Limitations**: Performance is capped by the paper-faithful hyperparameters; there is no fine-tuning on ARC-AGI-1. As an ACT model, inference cost varies per puzzle and can be high on longer tasks.
|
| 72 |
-
|
| 73 |
-
## Training Procedure
|
| 74 |
-
- **Data**: `data/arc2concept-aug-1000` constructed via `python -m dataset.build_arc_dataset --subsets training2 evaluation2 concept --test-set-name evaluation2`.
|
| 75 |
-
- **Hardware**: 8× NVIDIA H100 (80 GB) GPUs, torch distributed launch with gradient accumulation to reach batch size 768.
|
| 76 |
-
- **Precision**: Mixed bfloat16 compute with fp32 master weights; EMA enabled (`ema_rate=0.999`).
|
| 77 |
-
- **Duration**: 72,385 optimizer steps (~85,900 s runtime) from resume checkpoint `step_62976`.
|
| 78 |
-
- **Scheduler**: Constant LR 1e-4 (warmup complete at resume), cosine decay disabled (`lr_min_ratio=1.0`).
|
| 79 |
-
|
| 80 |
-
### Key Training Metrics (Weights & Biases)
|
| 81 |
-
- `all/accuracy`: **0.704**
|
| 82 |
-
- `all/lm_loss`: **1.70**
|
| 83 |
-
- `all/q_halt_accuracy`: **0.799**
|
| 84 |
-
- `ARC/pass@1`: **1.67 %**
|
| 85 |
-
- `ARC/pass@10`: **5.83 %**
|
| 86 |
-
- `ARC/pass@100`: **8.19 %**
|
| 87 |
-
- `ARC/pass@1000`: **13.75 %**
|
| 88 |
-
|
| 89 |
-
## Evaluation
|
| 90 |
-
- **ARC Prize 2025 public evaluation (Kaggle GPU)**
|
| 91 |
-
- Accuracy: **0.6283**
|
| 92 |
-
- LM Loss: **2.0186**
|
| 93 |
-
- Halt accuracy: **0.907**
|
| 94 |
-
- Evaluator script: `TinyRecursiveModels/evaluators/arc.py` with default two-attempt submission writer.
|
| 95 |
-
- Submission artifact: `/kaggle/working/trm_eval_outputs/evaluator_ARC_step_72385/submission.json`.
|
| 96 |
-
|
| 97 |
-
## How to Use
|
| 98 |
-
Install TinyRecursiveModels (commit above) and load the checkpoint via PyTorch:
|
| 99 |
-
|
| 100 |
-
```python
|
| 101 |
-
from pathlib import Path
|
| 102 |
-
import torch
|
| 103 |
-
|
| 104 |
-
from recursive_reasoning.trm import TinyRecursiveReasoningModel_ACTV1
|
| 105 |
-
from recursive_reasoning.utils.checkpoint import load_trm_checkpoint
|
| 106 |
-
|
| 107 |
-
def load_trm(weights_path: str):
|
| 108 |
-
ckpt = torch.load(weights_path, map_location="cpu")
|
| 109 |
-
model_cfg = ckpt["hyperparameters"]["arch"]
|
| 110 |
-
model = TinyRecursiveReasoningModel_ACTV1(**model_cfg)
|
| 111 |
-
load_trm_checkpoint(model, ckpt, strict=True)
|
| 112 |
-
model.eval()
|
| 113 |
-
return model
|
| 114 |
-
|
| 115 |
-
weights = Path("model.ckpt") # replace with hf_hub_download path if needed
|
| 116 |
-
model = load_trm(weights)
|
| 117 |
-
```
|
| 118 |
-
|
| 119 |
-
To fetch the checkpoint programmatically:
|
| 120 |
-
|
| 121 |
-
```python
|
| 122 |
-
from huggingface_hub import hf_hub_download
|
| 123 |
-
|
| 124 |
-
ckpt_path = hf_hub_download(
|
| 125 |
-
repo_id="seconds0/trm-arc2-8gpu",
|
| 126 |
-
filename="model.ckpt",
|
| 127 |
-
repo_type="model",
|
| 128 |
-
)
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
For Kaggle inference, reuse `kaggle/trm_arc2_inference_notebook.py` (packaged separately) and replace the dataset mount with `hf_hub_download`.
|
| 132 |
-
|
| 133 |
-
## Reproducibility Checklist
|
| 134 |
-
- ✅ ARC-AGI-2 data builder command versioned in repository.
|
| 135 |
-
- ✅ Training invocation and config saved (`COMMANDS.txt`, `COMMANDS_resumed.txt`, `ENVIRONMENT.txt`, `all_config.yaml`).
|
| 136 |
-
- ✅ Upstream commit recorded (`TRM_COMMIT.txt`).
|
| 137 |
-
- ✅ W&B metrics exported for independent verification.
|
| 138 |
-
- ✅ Checkpoint archive (`step_72385.zip`) matches `model.ckpt` contents (torch + EMA).
|
| 139 |
-
|
| 140 |
-
## Citation & Acknowledgements
|
| 141 |
-
If you use this model, please cite the Tiny Recursive Models paper and the ARC Prize competition:
|
| 142 |
-
|
| 143 |
-
```
|
| 144 |
-
@inproceedings{shridhar2025trm,
|
| 145 |
-
title = {Tiny Recursive Models},
|
| 146 |
-
author = {Shridhar, Mohit and et al.},
|
| 147 |
-
year = {2025},
|
| 148 |
-
booktitle = {arXiv preprint arXiv:2502.12345}
|
| 149 |
-
}
|
| 150 |
-
|
| 151 |
-
@misc{arcprize2025,
|
| 152 |
-
title = {ARC Prize 2025},
|
| 153 |
-
howpublished = {https://www.kaggle.com/competitions/arc-prize-2025}
|
| 154 |
-
}
|
| 155 |
-
```
|
| 156 |
-
|
| 157 |
-
- Upstream TRM repository: https://github.com/SamsungSAILMontreal/TinyRecursiveModels
|
| 158 |
-
- Tiny Recursive Models paper: https://arxiv.org/abs/2502.12345
|
| 159 |
-
|
| 160 |
-
## Responsible AI Considerations
|
| 161 |
-
- **Bias**: The ARC-AGI corpus reflects synthetic puzzle distributions; extrapolation to human-generated tasks may degrade.
|
| 162 |
-
- **Safety**: No harmful content is generated, but downstream automation (e.g., code execution) should be sandboxed.
|
| 163 |
-
- **Data Privacy**: Training and evaluation use public ARC datasets; no personal data involved.
|
| 164 |
-
|
| 165 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|