seconds-0 commited on
Commit
0f72c92
·
verified ·
1 Parent(s): 6f4c737

Fix model card: highlight real ARC task solve rate (2.92% pass@2)

Browse files
Files changed (1) hide show
  1. README.md +26 -14
README.md CHANGED
@@ -24,19 +24,19 @@ model-index:
24
  split: evaluation
25
  metrics:
26
  - type: accuracy
27
- name: Accuracy
28
- value: 0.6283
29
- - type: loss
30
- name: LM Loss
31
- value: 2.0186
32
  - type: accuracy
33
- name: Halt Accuracy
34
- value: 0.9070
 
 
 
35
  ---
36
 
37
  # Tiny Recursive Models — ARC-AGI-2 (8×GPU)
38
 
39
- **Abstract.** This release packages the complete paper-faithful Tiny Recursive Models (TRM) checkpoint trained for the full 100,000 steps on the ARC-AGI-2 augmentation suite. Due to training restarts, the step counter displays 72,385 instead of 100,000, but this represents the fully trained model with complete paper-faithful hyperparameters, dataset construction, and optimizer settings. The repository bundles the model weights, Hydra configs, training commands, and Weights & Biases metrics so researchers can reproduce ARC Prize 2025 evaluations or fine-tune TRM for downstream ARC-style reasoning tasks.
40
 
41
  **Special thanks** to Shawn Lewis (CTO of Weights & Biases) and the CoreWeave team (coreweave.com) for their generous contribution of 2 nodes × 8 × H200 GPUs worth of compute time via the CoreWeave Cloud platform. This work would not have been possible without their assistance and trust in the authors.
42
 
@@ -87,12 +87,24 @@ This release reproduces the ARC-AGI-2 configuration described in the TRM paper u
87
  - `ARC/pass@1000`: **13.75 %**
88
 
89
  ## Evaluation
90
- - **ARC Prize 2025 public evaluation (Kaggle GPU)**
91
- - Accuracy: **0.6283**
92
- - LM Loss: **2.0186**
93
- - Halt accuracy: **0.907**
94
- - Evaluator script: `TinyRecursiveModels/evaluators/arc.py` with default two-attempt submission writer.
95
- - Submission artifact: `/kaggle/working/trm_eval_outputs/evaluator_ARC_step_72385/submission.json`.
 
 
 
 
 
 
 
 
 
 
 
 
96
 
97
  ## How to Use
98
  Install TinyRecursiveModels (commit above) and load the checkpoint via PyTorch:
 
24
  split: evaluation
25
  metrics:
26
  - type: accuracy
27
+ name: ARC Task Solve Rate (pass@2)
28
+ value: 0.0292
 
 
 
29
  - type: accuracy
30
+ name: ARC Task Solve Rate (pass@100)
31
+ value: 0.0819
32
+ - type: accuracy
33
+ name: pass@1
34
+ value: 0.0167
35
  ---
36
 
37
  # Tiny Recursive Models — ARC-AGI-2 (8×GPU)
38
 
39
+ **Abstract.** This release packages the complete paper-faithful Tiny Recursive Models (TRM) checkpoint achieving **2.92% task solve rate (pass@2)** on ARC-AGI-2, the official ARC Prize 2025 competition metric. The model was trained for the full 100,000 steps (step counter displays 72,385 due to training restarts). With increased sampling, the model achieves 8.19% at pass@100. The repository bundles the model weights, Hydra configs, training commands, and Weights & Biases metrics so researchers can reproduce ARC Prize 2025 evaluations or fine-tune TRM for downstream ARC-style reasoning tasks.
40
 
41
  **Special thanks** to Shawn Lewis (CTO of Weights & Biases) and the CoreWeave team (coreweave.com) for their generous contribution of 2 nodes × 8 × H200 GPUs worth of compute time via the CoreWeave Cloud platform. This work would not have been possible without their assistance and trust in the authors.
42
 
 
87
  - `ARC/pass@1000`: **13.75 %**
88
 
89
  ## Evaluation
90
+
91
+ ### ARC-AGI-2 Task Solve Rates
92
+ **These are the real puzzle-solving performance metrics:**
93
+ - **pass@1**: 1.67% (single attempt per task)
94
+ - **pass@2**: **2.92%** (official ARC Prize 2025 competition metric)
95
+ - **pass@10**: 5.83%
96
+ - **pass@100**: 8.19%
97
+ - **pass@1000**: 13.75%
98
+
99
+ ### Model-Level Metrics
100
+ **These measure internal model behavior, not task success:**
101
+ - Token-level accuracy: 62.83% (not indicative of puzzle-solving)
102
+ - LM Loss: 2.0186
103
+ - Halt accuracy: 90.7% (ACT controller stopping mechanism)
104
+
105
+ ### Evaluation Details
106
+ - Evaluator script: `TinyRecursiveModels/evaluators/arc.py` with default two-attempt submission writer
107
+ - Submission artifact: `/kaggle/working/trm_eval_outputs/evaluator_ARC_step_72385/submission.json`
108
 
109
  ## How to Use
110
  Install TinyRecursiveModels (commit above) and load the checkpoint via PyTorch: