HRM Checkpoint β Sudoku Full
Checkpoint from training the Hierarchical Reasoning Model (HRM) on the full Sudoku Extreme dataset, following similar setup to sapientinc/HRM-checkpoint-sudoku-extreme.
python3 ./pretrain.py \
data_path=data/sudoku-extreme-full \
epochs=100 \
eval_interval=100 \
lr_min_ratio=0.1 \
global_batch_size=1152 \
lr=3e-4 \
puzzle_emb_lr=3e-4 \
weight_decay=0.1 \
puzzle_emb_weight_decay=0.1 \
arch.loss.loss_type=softmax_cross_entropy \
arch.L_cycles=8 \
arch.halt_max_steps=8 \
arch.pos_encodings=learned
I tried to mimic the file structure in sapientinc/HRM-checkpoint-sudoku-extreme, but I figured I'd add some extra stats:
This has the output evaluate.py typically produces across several max_steps settings, in an easier to read JSON format: evaluate-Sudoku-extreme-full.json.
I also ran it in a loop where I whittled down the set to only ones that were unsolved. You can see my method in run_subset.py.. It produces stats.json. That's what I'm graphing below.
And here's a graph of that data, somewhat like Figure 5c in the Hierarchical Reasoning Model paper:

(I should say that even though the graph shows at Mmax=1024 exact accuracy being at 100%, it's not really 100%. It's 99.9605%: Which corresponds to 422,619 correct of 422,786 total sudokus. Or 167 unsolved sudokus.)
Perhaps it would be useful to see the results as a table.
| Steps | Total | Solved | Solved % | Unsolved | Unsolved % |
|---|---|---|---|---|---|
| 0 | 422,786 | 0 | 0.000% | 422,786 | 100.000% |
| 1 | 422,786 | 262,006 | 61.971% | 160,780 | 38.029% |
| 2 | 422,786 | 373,996 | 88.460% | 48,790 | 11.540% |
| 4 | 422,786 | 399,675 | 94.534% | 23,111 | 5.466% |
| 8 | 422,786 | 411,387 | 97.304% | 11,399 | 2.696% |
| 16 | 422,786 | 417,326 | 98.709% | 5,460 | 1.291% |
| 32 | 422,786 | 420,155 | 99.378% | 2,631 | 0.622% |
| 64 | 422,786 | 421,523 | 99.701% | 1,263 | 0.299% |
| 128 | 422,786 | 422,111 | 99.840% | 675 | 0.160% |
| 256 | 422,786 | 422,412 | 99.912% | 374 | 0.088% |
| 512 | 422,786 | 422,555 | 99.945% | 231 | 0.055% |
| 1024 | 422,786 | 422,619 | 99.961% | 167 | 0.039% |
| 2048 | 422,786 | 422,654 | 99.969% | 132 | 0.031% |
| 4096 | 422,786 | 422,679 | 99.975% | 107 | 0.025% |
| 8192 | 422,786 | 422,690 | 99.977% | 96 | 0.023% |
| 16384 | 422,786 | 422,702 | 99.980% | 84 | 0.020% |
| 32768 | 422,786 | 422,715 | 99.983% | 71 | 0.017% |
| 65536 | 422,786 | 422,718 | 99.984% | 68 | 0.016% |
| 131072 | 422,786 | 422,724 | 99.985% | 62 | 0.015% |
| 262144 | 422,786 | 422,728 | 99.986% | 58 | 0.014% |
| 524288 | 422,786 | 422,732 | 99.987% | 54 | 0.013% |
| 1048576 | 422,786 | 422,734 | 99.988% | 52 | 0.012% |
| 2097152 | 422,786 | 422,739 | 99.989% | 47 | 0.011% |
| 4194304 | 422,786 | 422,741 | 99.989% | 45 | 0.011% |
π Browse the Sudoku puzzles solved by HRM (grouped by required inference steps) here: Sudoku Puzzle Collection
Usage
You should be able to run it as
HRM_LOCATION="/tmp/hrm" # Or wherever
CHECKPOINT_LOCATION="/tmp/HRM-checkpoint-sudoku-full" # Or wherever, of course.
git clone https://github.com/sapientinc/HRM "${HRM_LOCATION}"
# Running this, requires a bunch of configuration. Obviously Sapient has their
# own README.md, etc. But I've made a docker image that you might be able to
# use as a guide as well. I'll link it below.
git clone https://huggingface.co/bnsh/HRM-checkpoint-sudoku-full/ "${CHECKPOINT_LOCATION}"
cd "${HRM_LOCATION}"
python3 ./evaluate.py checkpoint="${CHECKPOINT_LOCATION}/checkpoint" data_path=data/sudoku-extreme-full/
And, here's that Docker image I mentioned: bnsh/hrm-docker (setup and usage guide).
Training Details
- Hardware: NVIDIA A10g
- Runtime: β 9 days, 3 hours, 34 minutes, 25 seconds (13174m 24.845s)
- Parameters: ~27.3M
Final Metrics
| Metric | Value |
|---|---|
| Train Accuracy | 0.98701 |
| Train Exact Accuracy | 0.96367 |
| Train LM Loss | 0.27213 |
| Train Q Continue Loss | 0.13321 |
| Train Q Halt Accuracy | 1.0 |
| Train Q Halt Loss | 0.00632 |
| Train Steps | 1.90995 |
Run History (ASCII plots)
num_params β
train/accuracy βββββββββββ
β
β
ββ
βββ
ββββββββββββββββββββββ
train/count ββββββββββββββββββββββββββββββββββββββββ
train/exact_accuracy ββββββββ
β
β
β
βββββββββββββββββββββββββββββ
train/lm_loss βββββββββββ
β
β
β
β
β
β
β
β
ββββ
βββββββββββββββββ
train/lr βββββββββββββββββββββ
βββββββββββββββββββ
train/q_continue_loss ββββββ
β
ββ
β
β
ββββ
ββ
β
βββ
β
β
ββ
β
ββ
β
βββββββββββ
train/q_halt_accuracy ββββββββ
ββββββββββββββββββββββββββββββββ
train/q_halt_loss βββββββββββ
βββββββββ
βββββββββ
βββββ
ββββ
ββ
train/steps ββββββ
β
β
ββββββββββββββββββββββββββββββββ
Reference
Reference: Hierarchical Reasoning Model (HRM), Arxiv:2506.21734
- Downloads last month
- 2