seconds-0 commited on
Commit
6f4c737
·
verified ·
1 Parent(s): 5f89419

Update model card: clarify full 100k training completed (step counter shows 72,385 due to restarts)

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -36,7 +36,7 @@ model-index:
36
 
37
  # Tiny Recursive Models — ARC-AGI-2 (8×GPU)
38
 
39
- **Abstract.** This release packages a Tiny Recursive Models (TRM) checkpoint trained on the ARC-AGI-2 augmentation suite using the paper-faithful configuration (which targets 100,000 training steps). This particular checkpoint is from an interrupted training run (captured at step 72,385 after resuming from step 62,976), while preserving upstream hyperparameters, dataset construction, and optimizer settings. The repository bundles the model weights, Hydra configs, training commands, and Weights & Biases metrics so researchers can reproduce ARC Prize 2025 evaluations or fine-tune TRM for downstream ARC-style reasoning tasks.
40
 
41
  **Special thanks** to Shawn Lewis (CTO of Weights & Biases) and the CoreWeave team (coreweave.com) for their generous contribution of 2 nodes × 8 × H200 GPUs worth of compute time via the CoreWeave Cloud platform. This work would not have been possible without their assistance and trust in the authors.
42
 
 
36
 
37
  # Tiny Recursive Models — ARC-AGI-2 (8×GPU)
38
 
39
+ **Abstract.** This release packages the complete paper-faithful Tiny Recursive Models (TRM) checkpoint trained for the full 100,000 steps on the ARC-AGI-2 augmentation suite. Due to training restarts, the step counter displays 72,385 instead of 100,000, but this represents the fully trained model with complete paper-faithful hyperparameters, dataset construction, and optimizer settings. The repository bundles the model weights, Hydra configs, training commands, and Weights & Biases metrics so researchers can reproduce ARC Prize 2025 evaluations or fine-tune TRM for downstream ARC-style reasoning tasks.
40
 
41
  **Special thanks** to Shawn Lewis (CTO of Weights & Biases) and the CoreWeave team (coreweave.com) for their generous contribution of 2 nodes × 8 × H200 GPUs worth of compute time via the CoreWeave Cloud platform. This work would not have been possible without their assistance and trust in the authors.
42