Hierarchical Reasoning Model
Paper
β’
2506.21734
β’
Published
β’
47
This repository hosts checkpoints for the Hierarchical Reasoning Model with Adaptive Computation Time (HRMACT), trained on Sudoku puzzles.
.safetensorsπ Note: The original HRM paper recommends training for 10,000+ steps for best performance. These checkpoints are intended as a lightweight, educational reference.
For the full implementation, training pipeline, and evaluation tools, see the main repo: π ZoneTwelve/HRM-Sudoku
To run evaluation on a checkpoint:
python evaluate.py <checkpoint>
Evaluation is performed with evaluate.py.
Each checkpoint is evaluated 100 times per difficulty to estimate average performance.
Example format for results:
| Model | very-easy | easy | medium | hard | extreme |
|---|---|---|---|---|---|
| checkpoint-1 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| checkpoint-250 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| checkpoint-500 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| checkpoint-750 | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| checkpoint-1000 | 2.00% | 0.00% | 0.00% | 0.00% | 0.00% |
| checkpoint-1250 | 15.00% | 4.00% | 0.00% | 0.00% | 0.00% |
| checkpoint-1500 | 42.00% | 18.00% | 1.00% | 0.00% | 0.00% |
| checkpoint-1750 | 61.00% | 40.00% | 1.00% | 1.00% | 0.00% |
| checkpoint-2000 | 64.00% | 28.00% | 1.00% | 1.00% | 0.00% |
| checkpoint-2250 | 84.00% | 67.00% | 5.00% | 2.00% | 0.00% |
| checkpoint-2500 | 80.00% | 66.00% | 22.00% | 8.00% | 0.00% |
| checkpoint-2750 | 91.00% | 81.00% | 42.00% | 13.00% | 3.00% |
| checkpoint-3000 | 98.00% | 95.00% | 38.00% | 13.00% | 1.00% |
If you use this model in academic work, please cite:
@misc{wang2025hierarchicalreasoningmodel,
title={Hierarchical Reasoning Model},
author={Guan Wang and Jin Li and Yuhao Sun and Xing Chen and Changling Liu and Yue Wu and Meng Lu and Sen Song and Yasin Abbasi Yadkori},
year={2025},
eprint={2506.21734},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2506.21734},
}
Apache License 2.0